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Abstract 

The  Air  Force’s  need  for  accurate  navigation  information  is  often  met  through 
the  combination  of  inertial  and  global  positioning  systems.  However,  the  increased 
use  of  smaller  and  smaller  unmanned  aerial  systems  opens  the  possibility  of  flight  in 
environments  where  satellite  navigation  signals  are  either  significantly  degraded  or 
unavailable  entirely,  such  as  indoors,  in  dense  urban  areas,  and  underground.  Fortu¬ 
nately,  the  intrinsic  orthogonal  structure  of  man-made  environments  can  be  exploited 
to  aid  in  determining  a  vehicle’s  attitude  when  satellite  signals  are  unavailable.  This 
research  aims  to  obtain  accurate  and  stable  estimates  of  a  vehicle’s  attitude  by  cou¬ 
pling  consumer-grade  inertial  and  optical  sensors.  This  goal  is  pursued  by  first  mod¬ 
eling  both  inertial  and  optical  sensors  and  then  developing  a  technique  for  identifying 
vanishing  points  in  perspective  images  of  a  structured  environment.  The  inertial 
and  optical  processes  are  then  coupled  to  enable  each  one  to  aid  the  other.  The 
vanishing  point  measurements  are  combined  with  the  inertial  data  in  an  extended 
Kalman  filter  to  produce  overall  attitude  estimates.  This  technique  is  experimentally 
demonstrated  in  an  indoor  corridor  setting  using  a  motion  profile  designed  to  simu¬ 
late  flight.  Through  comparison  with  a  tactical-grade  inertial  sensor,  the  combined 
consumer-grade  inertial  and  optical  data  are  shown  to  produce  a  stable  attitude  solu¬ 
tion  accurate  to  within  1.5  degrees.  A  measurement  bias  is  manifested  which  degrades 
the  accuracy  by  up  to  another  2.5  degrees. 
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Coupling  Vanishing  Point  Tracking 


with  Inertial  Navigation  to 
Estimate  Attitude  in  a  Structured  Environment 

I.  Introduction 

The  United  States  Air  Force  depends  on  precision  navigation  to  accomplish  its 
mission.  To  help  fulfill  this  need,  many  of  the  air  vehicles  used  by  the  Air 
Force  are  equipped  with  inertial  navigation  systems  (INSs).  While  these  inertial 
systems  provide  useful  information  regarding  position  and  attitude  in  the  short  term, 
even  the  most  advanced  are  subject  to  ever-increasing  errors.  Conventionally,  these 
errors  are  mitigated  by  augmenting  the  inertial  system  with  information  from  another 
source  such  as  the  global  positioning  system  (GPS).  Unfortunately,  the  absence  of 
an  alternate  technology  for  constraining  inertial  error  growth  induces  a  dependency 
on  the  GPS.  The  Chief  of  Staff  of  the  Air  Force  has  addressed  this  dependency, 
stating,  “It  seems  critical  to  me  that  the  Joint  force  should  reduce  its  dependence 
on  GPS-aided  precision  navigation  and  timing,  allowing  it  to  ultimately  become  less 
vulnerable,  yet  equally  precise,  and  more  resilient”  [23]. 

1.1  Unmanned  Aerial  Vehicles 

Unmanned  aerial  vehicles  (UAVs)  are  among  the  assortment  of  military  assets 
which  have  come  to  rely  on  the  GPS  for  precision  navigation.  Such  vehicles  have 
proven  effective  at  providing  tactical  advantages  in  the  recent  conflicts  on  Iraq  and 
Afghanistan  and  will  likely  continue  to  be  a  critical  part  of  the  United  States  arsenal 
for  years  to  come.  Some  of  these  vehicles,  such  as  the  RQ-11  Raven,  are  small  enough 
to  be  carried  and  hand-launched  by  a  single  soldier.  As  smaller  and  smaller  UAVs  are 
developed,  they  will  become  capable  of  flight  in  environments  which  have  historically 
been  unopen  to  aerial  vehicles,  such  as  inside  buildings  or  underground. 
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1.1.1  Quadrotors.  Rotary-wing  vehicles  have  certain  advantages  over  fixed- 
wing  aircraft  when  it  comes  to  operating  in  tight  quarters.  Specifically,  slow  flight 
and  hover  capabilities  eliminate  the  need  to  maintain  forward  velocity  to  produce  lift. 
One  form  of  miniature  rotary  vehicle  that  has  become  more  common  for  scientific 
research  in  recent  years  is  the  quadrotor  [16].  ra-  These  vehicles  use  four  counter¬ 
rotating  fixed-pitch  propellers  to  generate  lift.  Desired  motion  is  obtained  by  simply 
varying  the  propellers’  rotational  speeds.  Accurate  attitude  knowledge  is  requisite 
to  controlling  a  quadrotor,  as  any  rotation  is  immediately  converted  to  translational 
motion  by  the  vehicle’s  dynamics. 

To  facilitate  navigational  research,  the  Advanced  Navigation  Technology  (ANT) 
Center  at  the  Air  Force  Institute  of  Technology  (AFIT)  has  developed  several  air 
vehicle  platforms,  including  quadrotors.  One  such  vehicle  is  depicted  in  Figure  11.11 
This  vehicle  is  equipped  with  a  lightweight  micro  electro-mechanical  systems  (MEMS) 
inertial  measurement  unit  (IMU)  and  a  commercial  webcam  which  can  be  used  to 
augment  the  inertial  data  in  the  place  of  a  GPS  receiver. 


Figure  1.1:  Quadrotor  (bottom  view).  The  quadrotor  shown  here  was  developed  by 
the  ANT  Center  as  a  research  platform  for  vision-based  aerial  navigation  solutions. 
It  is  equipped  with  both  a  MEMS  inertial  unit  and  commercial  webcam. 
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1.2  Indoor  Operation 

The  indoor  operating  environment  poses  distinct  challenges  to  aerial  vehicles. 
Tight  quarters  provide  little  room  for  maneuvering,  and  obstructions  abound.  To  add 
to  the  difficulties,  the  GPS  signal  is  substantially  degraded  indoors.  This  research  aims 
to  provide  an  alternative  to  the  GPS  for  augmenting  inertial  attitude  estimates  in  an 
indoor  environment. 

1.2.1  Manhattan  World  Assumption.  Many  man-made  environments  such 
as  dense  urban  or  indoor  settings  have  a  consistent,  orderly  structure  which  can 
be  exploited  to  obtain  attitude  information  from  an  optical  sensor.  The  attitude 
estimation  technique  described  in  this  thesis  is  intended  for  use  in  such  environments. 
The  following  assumptions  regarding  the  structure  of  the  indoor  environment  in  which 
an  aerial  vehicle  is  to  operate  are  used  to  enable  the  techniques  described  herein: 

•  All  floors  and  ceilings  are  flat  and  level. 

•  All  walls  are  flat  and  vertical. 

•  All  rooms  and  corridors  meet  at  right  angles. 

Structural  features  of  such  an  environment  will  align  to  an  orthogonal  three-dimensional 
(3-D)  grid  described  as  the  “Manhattan  world”  in  [?]. 

1.2.2  Vanishing  Points.  Most  of  the  lines  defining  the  edges  and  intersec¬ 
tions  of  planar  surfaces  in  the  Manhattan  world  are  aligned  to  one  of  three  mutually 
orthogonal  directions,  forming  large  groups  of  mutually  parallel  lines.  All  of  the  lines 
in  any  of  these  groupings  will  appear  to  converge  at  a  single  point,  called  a  vanishing 
point,  in  perspective  images  of  a  Manhattan  world  scene.  The  positions  of  these  van¬ 
ishing  points  in  an  image  are  shown  in  [11]  to  be  invariant  to  translational  motion  of 
the  camera,  and  only  change  when  the  camera  is  rotated. 
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1.3  Problem  Formulation 


The  problem  that  we  are  trying  to  solve  is  this:  attitude  estimates  provided 
by  the  IMU  aboard  our  vehicle  are  subject  to  drift,  which  leads  to  an  unbounded 
increase  in  attitude  error.  We  wish  to  use  inertial  data  to  aid  in  finding  vanishing 
points  in  perspective  images  of  a  Manhattan  world  environment  and,  in  turn,  use 
the  positions  of  these  vanishing  points  to  constrain  the  long-term  drift  in  the  inertial 
attitude  estimates.  This  will  be  accomplished  by  combining  the  inertial  and  optical 
data  in  an  extended  Kalman  filter. 

1.4  Research  Contributions 

The  primary  contribution  of  this  research  is  a  deeply  coupled  vanishing  point 
and  inertial  attitude  estimation  method.  The  tight  coupling  of  the  inertial  and  optical 
sensors  used  for  this  research  results  in  an  overall  attitude  solution  unattainable  using 
either  sensor  alone.  Using  the  methods  described  herein,  a  drift-free  attitude  solution 
accurate  to  within  1.5  degrees  1-cr  and  biased  by  only  2  degrees  is  obtained  using  a 
pair  of  small,  inexpensive,  light-weight,  low-power  sensors. 

1 . 5  Outline 

The  remainder  of  this  thesis  is  organized  in  the  following  manner.  Chapter  |TT] 
discusses  the  mathematical  and  topical  backgrounds  behind  inertial  navigation  and 
optical  sensors,  as  well  as  other  researchers’  key  contributions  related  to  them.  Chap¬ 
ter  mu  outlines  the  manner  in  which  the  coupled  inertial  and  optical  attitude  estima¬ 
tion  technique  presented  herein  was  developed  and  the  experimental  procedures  used 
to  evaluate  it.  The  experimental  results  are  presented  and  analyzed  in  Chapter  II VI 
and  conclusions  are  made  with  a  few  suggestions  for  future  work  in  Chapter  [V] 
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II.  Background 


This  chapter  outlines  the  concepts  one  must  understand  in  order  to  implement  the 
attitude  estimation  technique  described  in  this  thesis.  The  notational  conven¬ 
tions  used  herein  are  presented  in  Section  12.11  followed  by  basic  concepts  in  terrestrial 
navigation  in  Section  12.21  Next,  inertial  navigation  techniques  and  their  associated 
limitations  are  described  in  Section  12.31  Computer  vision  techniques  will  be  used  to 
aid  the  navigation  solution  provided  by  an  inertial  sensor,  so  concepts  in  digital  imag¬ 
ing  and  image  processing  are  presented  in  Sections  12.41  and  12.51  Finally,  methods  for 
combining  data  from  multiple  sources  are  described  in  Section  [2761  Other  researchers’ 
contributions  to  the  body  of  knowledge  in  these  subject  areas  are  also  briefly  discussed 
throughout. 

2.1  Notation 

Certain  conventions  are  used  throughout  the  body  of  this  work  pertaining  to 
how  variable  quantities  are  represented.  These  conventions  are  intended  to  help  the 
reader  understand  the  quantities  expressed  in  the  equations,  figures,  tables,  and  text 
of  this  work,  and  are  as  follows: 

Scalars:  Scalars  are  represented  by  either  upper  or  lowercase  characters  in  italics, 
e.g.,  a  or  A. 

Vectors:  Vector  quantities  are  represented  by  lowercase  characters  in  bold,  e.g.,  a 
or  xp.  Unless  specifically  stated  otherwise,  all  vectors  should  be  interpreted  as 
column  vectors. 

Vector  Components:  The  scalar  components  of  a  vector  are  represented  with  sub¬ 
scripts  indicating  their  corresponding  axes,  e.g.,  the  x-component  of  the  vector 
a  is  represented  as  ax. 

Homogeneous  Vectors:  Homogeneous  vectors  are  defined  to  have  a  final  compo¬ 
nent  equal  to  1  and  are  represented  with  an  underscore,  e.g.,  a. 
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Skew-symmetric  Matrices:  The  skew-symmetric  matrix  form  of  3-vectors  is  some¬ 
times  useful  for  mathematical  computations  involving  matrices  and/or  vector 
cross  products.  Vectors  represented  in  skew-symmetric  matrix  form  are  followed 
by  the  “cross”  character,  e.g.,  ax.  A  vector  a  with  components  ax,  ay,  and  az 
is  described  in  skew-symmetric  matrix  form  as  shown  in  Equation  (12.111. 


ax 


0  -az  ay 
az  0  -ax 
~~Ojy  CLX  0 


(2.1) 


Matrices:  Matrices  are  represented  by  uppercase  characters  in  bold,  e.g.,  A  or  tE 

Estimated  Variables:  Variables  which  represent  an  estimate  of  a  particular  quan¬ 
tity  are  represented  with  the  “hat”  character,  e.g.,  a. 

Corrupted  Variables:  Variables  which  are  corrupted  by  errors  are  represented  with 
the  “tilde”  character,  e.g.,  a. 

Relative  Motion:  When  relative  motion  is  specified,  combined  subscripts  are  added 
to  a  vector  to  describe  the  motion,  e.g.,  pab  represents  the  relative  position  from 
frame  a  to  frame  b. 

A  Priori  and  A  Posteriori  Estimates:  Within  a  Kalman  filter,  it  is  necessary  to 
distinguish  between  two  estimates  of  a  random  variable’s  mean  and  uncertainty 
which  are  held  at  the  same  instant  in  time-the  a  priori  estimate  that  is  deter¬ 
mined  without  incorporating  new  information  from  a  measurement  update  and 
the  a  posteriori  estimate  which  does  incorporate  the  measurement  information. 
In  such  instances,  a  “minus”  character  superscript  is  added  to  the  time  variable 
for  a  priori  estimates,  and  a  “plus”  character  superscript  on  the  time  variable 
indicates  a  posteriori  estimates,  e.g.,  a(t~ )  or  a(t+). 
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2. 2  Reference  Frames 

Navigation  information  is  quantified  using  a  defined  reference  frame.  This  ref¬ 
erence  frame  is  used  to  describe  position  relative  to  a  point  or  surface  and  to  give 
a  mathematical  realization  to  vector  quantities  such  as  velocity,  acceleration  and 
torque.  A  particular  vector  will  have  different  mathematical  realizations  in  different 
coordinate  systems,  although  the  vector  itself  is  unchanged.  Throughout  this  text, 
the  reference  frame  in  which  a  vector  is  expressed  is  denoted  with  a  superscript.  For 
example,  the  vector  y  expressed  in  the  Earth-centered,  Earth- fixed  (ECEF)  refer¬ 
ence  frame  would  appear  as  ye.  Common  reference  frames  used  in  navigation  and 
computer  vision  include  the  following: 

Earth-fixed  inertial  frame  (f-frame)  -  The  origin  is  fixed  at  the  center  of  the 
Earth,  with  the  x  and  y- axes  on  the  equatorial  plane  and  the  2-axis  along  the 
Earth’s  axis  of  rotation.  The  i-frame  does  not  spin  with  the  Earth  but  does 
follow  the  Earth’s  orbit  around  the  sun.  Though  it  is  not  a  true  inertial  frame, 
for  the  sake  of  terrestrial  navigation  it  can  be  considered  as  such. 

Earth-centered  Earth- fixed  frame  (e-frame)  -  The  origin  is  fixed  at  the  center 
of  the  Earth,  with  the  x-axis  on  the  equatorial  plane  pointing  to  the  prime 
meridian,  2-axis  parallel  to  the  Earth’s  axis  of  rotation,  and  y-axis  located  so  as 
to  form  a  right-handed  orthogonal  triad.  Unlike  the  i-frame,  the  e-frame  spins 
along  with  the  Earth. 

Earth-fixed  Navigation  frame  (n-frame)  -  This  is  a  locally  defined  reference  frame 
with  its  origin  determined  arbitrarily.  The  origin  is  typically  fixed  to  the  Earth’s 
surface  with  the  x-axis  pointing  north,  y-axis  pointing  east  and  2-axis  pointing 
down.  Often,  it  is  also  called  the  North-East-Down  (NED)  frame. 

Body  frame  (6-frame)  -  The  origin  is  usually  either  at  the  center  of  gravity  (eg)  of 
a  moving  vehicle  or  at  the  center  of  a  triad  of  inertial  sensors.  For  aircraft,  it 
is  typically  defined  with  the  x-axis  out  the  nose,  y-axis  out  the  right  wing,  and 
2- axis  out  the  belly,  as  shown  in  Figure  [2721 
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Greenwich 

Meridian 


Figure  2.1:  Common  coordinate  systems.  The  Earth-fixed  inertial,  Earth- centered 
Earth-fixed,  and  navigation  frames  are  shown.  The  origins  of  the  inertial  and  Earth 
frames  are  at  the  Earth’s  center  of  mass  while  the  origin  of  the  navigation  frame  is 
fixed  on  the  Earth’s  surface.  (Figure  taken  from  [31]) 


Figure  2.2:  Body  reference  frame.  For  aircraft,  the  b-frame  is  oriented  with  the 

rc-axis  out  the  nose,  y- axis  out  the  right  wing  and  £-axis  out  the  belly.  (Figure  taken 
from  [131]) 
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Camera  frame  (c-frame)  -  The  origin  is  at  the  optical  center  of  the  camera,  with 
the  x-axis  pointed  upward,  j/-axis  to  the  right  and  ^-axis  out  the  lens,  as  shown 
in  Figure  12.31 


X 


c 


A 


Figure  2.3:  Camera  reference  frame.  The  x-axis  is  upward  in  the  images  captured, 
y- axis  to  the  right,  and  z-axis  out  the  lens. 

Image  frame  (pzx-frame)  -  Unlike  the  other  frames  mentioned,  the  pzx-frame  has 
only  two  dimensions.  The  origin  is  beyond  the  upper-left  pixel  of  a  digital  image, 
but  multiple  conventions  exist  throughout  image  processing  literature  for  pixel 
indexing  and  axis  orientation.  In  this  work,  images  are  indexed  according  to 
the  matrix  storage  format  used  by  The  Mathworks,  Inc.’s  Matlab  software,  with 
the  upper  left  pixel  indexed  as  (1,1),  the  x-axis  down  the  left  side  and  y- axis 
across  the  top  of  the  image. 

The  navigation  described  in  this  thesis  will  be  defined  with  reference  to  a  local  level 
Earth-fixed  navigation  frame  with  its  origin  on  the  floor  of  an  indoor  corridor  centered 
between  the  walls,  the  x-axis  along  the  length  of  the  corridor  and  the  z-axis  pointed 
down  as  depicted  in  Figure  [2~fl 
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Figure  2.4:  Local  level  navigation  frame.  The  local  level  navigation  frame  depicted 
here  is  used  as  a  reference  for  defining  attitude  throughout  this  research. 


2.2.1  Coordinate  Transformations.  Often,  it  is  necessary  to  convert  vector 
quantities  from  one  coordinate  system  to  another.  This  is  accomplished  by  per¬ 
forming  a  vector  transformation.  There  are  two  possible  ways  in  which  right-hand 
orthogonal  reference  frames  may  differ  from  one  another  at  a  particular  instant  in 
time-translation  and  rotation. 


2.2. 1.1  Translation.  If  the  origins  of  two  reference  frames  are  not 
collocated,  a  position  vector  is  used  to  describe  the  position  of  one  with  respect  to 
the  other.  In  the  case  where  two  reference  frames  have  principle  axes  parallel  to 
one  another  but  with  spatially  separated  origins,  any  vector  in  one  frame  will  have 
the  same  mathematical  description  in  the  other.  The  coordinates  of  a  fixed  point, 
however,  will  differ  between  the  two  representations.  The  conversion  of  a  particular 
coordinate  triad  from  one  frame  to  the  other  is  described  by  Equation  fl2.2[).  where  pb 
represents  the  position  of  P  from  the  origin  of  frame  b  and  pa  represents  its  position 
from  the  origin  of  frame  a  as  depicted  in  Figure  [2751 


Pb  =  Pa~  Pab 


(2.2) 
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Figure  2.5:  Position  vectors  in  two  different  but  parallel  coordinate  systems.  The 
position  of  point  P  from  the  origin  of  the  6-frame,  pb,  is  equal  to  the  position  of 
point  P  from  the  origin  of  the  a-frame,  pa,  minus  the  relative  position  vector  between 
frames  a  and  6,  pab. 

2.2. 1.2  Rotation.  If  two  reference  frames  are  oriented  such  that  one 
or  more  of  the  principle  axes  are  not  co-directional,  then  there  is  a  relative  rotation 
between  them.  There  are  several  ways  to  represent  the  relative  rotation  between 
reference  frames.  Two  of  those  are  direction  cosine  matrices  (DCMs)  and  Euler  angles. 

2.2. 1.2.1  Direction  Cosine  Matrices.  To  use  a  direction 
cosine  matrix  to  transform  a  vector  from  one  frame  to  another,  the  vector  is  pre- 
multiplied  by  the  DCM.  The  symbol  “(7”  is  commonly  used  to  represent  a  DCM, 
with  a  subscript  representing  the  originating  coordinate  system  and  a  superscript 
representing  the  destination  coordinate  system.  As  an  example,  the  representation  in 
a  reference  frame  b  of  a  vector  y  could  be  determined  from  its  representation  in  frame 
a  as  shown  in  Equation  (12.31). 

yb  =  Cbaya  (2.3) 

A  DCM’s  dynamics  are  described  by  the  following  differential  equation: 

c\  =  C*K„x]  (2.4) 

One  key  property  of  DCMs  is  that  the  determinant  is  always  equal  to  one.  This 
ensures  that  a  vector’s  magnitude  is  preserved  when  it  is  multiplied  by  the  DCM. 
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Since  the  determinant  is  nonzero,  this  property  also  ensures  that  an  inverse  DCM 
exists.  If  the  DCM  to  convert  from  one  frame  to  another  is  known,  the  DCM  to 
convert  back  is  simply  the  inverse  of  the  first,  as  shown  in  Equation  (I2.5[).  Another 
convenient  property  is  that  the  inverse  of  a  DCM  is  always  its  transpose,  which  greatly 
simplifies  the  mathematics  in  computing  inverse  DCMs. 

Ci  =  [C‘]-‘  =  [C‘f  (2.5) 

2. 2. 1.2. 2  Euler  Angles.  Another  way  that  relative  rotation 
between  reference  frames  can  be  expressed  is  through  the  use  of  Euler  angles.  These 
angles  represent  the  following  three  successive  rotations: 

1.  A  rotation  through  angle  if;  about  the  originating  reference  frame’s  2-axis 

2.  A  rotation  through  angle  9  about  the  new  intermediate  reference  frame’s  y- axis, 

y' 

3.  A  rotation  through  angle  </>  about  the  second  intermediate  reference  frame’s 
rc-axis,  x" 

To  obtain  an  equivalent  DCM  from  a  set  of  Euler  angles,  the  Euler  angles  are  each 
represented  as  a  separate  DCM,  and  the  total  rotation  is  the  product  of  all  three  as 
shown  in  Equation  (12.6(1 . 
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2.2. 1.3  Transformation  Matrices.  Translation  and  rotation  can  be 
performed  simultaneously  through  the  use  of  homogeneous  vectors  and  a  transforma¬ 
tion  matrix.  Homogeneous  vectors  are  constructed  by  augmenting  a  vector  with  an 
additional  element  equal  to  one.  For  instance,  the  vector  r  with  components  rx,  ry 
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and  rz  would  be  converted  to  the  homogeneous  form  r  as  shown  in  Equation  (I2.7j) . 


T 


r  = 


r„  rz 


(2.7) 


If  the  rotation  between  a  particular  reference  frame  a  and  another  reference  frame  b 
is  described  by  the  DCM  Cba  and  the  translation  from  b  to  a  is  described  by  pba,  then 
the  matrix  Tba  used  to  transform  a  homogeneous  vector  from  frame  a  to  frame  b  is 
given  by  Equation  (12.8)1 . 


rb  =  Tbara 


rrib 

a 


Ci  |  -Ab 

0lx3  |  1 


(2.8) 


2.3  Inertial  Navigation 

Inertial  navigation  relies  on  the  concept  that  starting  from  a  known  position, 
velocity,  and  attitude,  a  vehicle’s  position  and  attitude  can  be  determined  by  sensing 
its  motion,  i.e.,  acceleration  and  rotational  velocity.  A  typical  IMU  consists  of  at  least 
three  accelerometers  that  measure  specific  force  relative  to  the  inertial  frame  and  three 
gyroscopes  (or  gyros)  that  measure  either  rotational  acceleration  or  rotational  velocity 
relative  to  the  inertial  frame. 

There  are  two  general  types  of  IMU-either  platform  or  strapdown.  A  platform 
INS  has  its  sensors  mounted  on  a  platform  that  is  gimbaled  so  as  to  always  have  the 
vertical  sensor  aligned  with  local  gravity,  while  a  strapdown  IMU  is  rigidly  mounted 
to  the  vehicle.  The  accelerometers  and  gyroscopes  of  a  strapdown  IMU  are  typically 
mounted  in  an  orthogonal  triad  with  their  input  axes  parallel  to  the  b- frame’s  principle 
axes  so  as  to  provide  outputs  in  the  body  frame.  Vehicles  like  the  quadrotor  for  which 
the  estimation  method  developed  in  this  thesis  is  intended  are  commonly  equipped 
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with  MEMS  strapdown  IMUs,  due  the  small  size,  weight  and  power  requirements  of 
such  devices.  Titterton  and  Weston  thoroughly  describe  strapdown  inertial  navigation 


m 


2.3.1  Inertial  Attitude  Dynamics.  Since  the  focus  of  this  thesis  is  attitude 
estimation,  inertial  attitude  calculations  are  described  here.  The  quantity  of  interest 
regarding  attitude  is  the  relative  rotation  between  the  vehicle  body  frame  and  the 
navigation  frame  which  can  be  captured  by  the  DCM,  C%.  Applying  the  relationship 
from  Equation  (12.41)  to  this  DCM  gives: 


(2.9) 


Because  the  strapdown  IMU  provides  measurements  of  the  rotation  rate  between  the 
inertial  and  body  frames  expressed  in  the  body  frame,  an  expression  relating  the 
body-to-navigation  frame  DCM  to  this  rotation  rate  is  desired.  Such  an  expression  is 
obtained  by  manipulating  Equation  (12.91). 

The  rotation  rate  between  the  n  and  6-frames  is  the  difference  between  the 
inertial-to-body  frame  rotation  rate,  the  rotation  rate  between  the  n  and  e-frames, 
and  the  Earth’s  sidereal  rate  as  shown  in  Equation  (12.101). 


(2.10) 


Since  the  n-frame  is  fixed  to  the  surface  of  the  Earth,  the  rotation  rate  between  the 
n  and  e-frames  is  always  zero.  This  simplifies  Equation  (12. 10|)  to: 


(2.11) 
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Finally,  converting  Equation  (12. 1 1  j)  to  skew-symmetric  form  and  substituting  into 
Equation  (I2.9[)  yields  the  expression  shown  in  Equation  (12. 12ft. 

Cn„  =  Cf[^x]  -  C"[v‘ex]C‘CZ  (2.12) 

2.3.2  Inertial  Attitude  Errors.  Inertial  system  rate  gyros  are  subject  to 
errors  stemming  from  various  sources.  Some  of  these  include  fixed  and  acceleration- 
dependent  biases,  scale  factor,  sensor  misalignment  and  measurement  noise.  All  of 
these  effects  can  be  combined  in  an  overall  rotation  model  in  which  the  measured 
rotation,  ,  is  a  function  of  the  true  rotation,  c ohib,  a  measurement  bias,  bb  and 
zero-mean,  additive,  white,  Gaussian  noise,  w6. 

“4,  =  <  +  bb  +  w1  (2.13) 

For  the  purposes  of  this  research,  the  bias  will  be  treated  as  a  fixed,  deterministic 
quantity.  Though  the  bias  might  be  more  accurately  described  as  a  first-order  Gauss- 
Markov  process,  the  fixed  deterministic  model  is  justified  considering  the  short  time 
between  image  measurement  updates.  The  strength  of  the  noise  term  is  determined 
experimentally  by  observing  the  error  growth  rates  in  multiple  inertial-only  attitude 
computations. 

Due  to  the  additive  nature  of  the  noise  and  bias  terms  in  Equation  (I2.13[).  the 
attitude  solution  provided  by  an  inertial  system  will  drift  over  time.  The  longer 
the  inertial  system  operates  unaided,  the  larger  the  errors  will  become.  This  slow 
drift  is  often  compensated  for  using  an  additional  sensor  with  higher  frequency  error 
properties  such  as  in  an  embedded  GPS/INS  (EGI).  In  the  case  of  an  indoor  setting 
where  GPS  is  unavailable,  an  alternative  approach  is  required,  such  as  the  vision- 
aiding  described  in  this  thesis. 
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2-4  Digital  Imaging 

Digital  imaging  is  the  process  by  which  an  optical  sensor  converts  luminous 
energy  (light)  into  an  array  of  digital  data.  These  data  can  then  by  interpreted  by 
a  digital  computer  and  displayed  to  an  interested  user  as  a  photograph.  In  order 
for  a  digital  image  to  be  produced,  light  must  originate  from  a  source,  reflect  off  the 
various  elements  of  the  subject,  enter  the  aperture  of  an  optical  sensor  and  stimulate 
the  sensor’s  photoelectric  array.  The  output  from  the  photoelectric  array  is  then 
amplified  and  sampled  by  an  analog-to-digital  (A/D)  converter  to  produce  the  digital 
image.  This  process  is  depicted  in  Figure  12.61 


ILLUMINATION 


Figure  2.6:  Imaging  sensor  diagram.  The  imaging  sensor  interprets  light  reflected 
through  the  optics  as  a  digital  image.  (Figure  taken  from  [51] ) 

2-4-1  Projective  Geometry.  The  process  of  central  projection  renders  a  3-D 
subject  as  a  two-dimensional  (2-D)  image.  We  wish  to  determine  mathematically  how 
to  transform  the  coordinates  of  a  point  expressed  in  the  3-D  camera  frame  into  the 
2-D  image  frame.  With  such  a  model  in  place,  the  data  provided  by  an  imaging  sensor 
can  be  interpreted  as  it  relates  to  the  3-D  scene.  This  is  done  by  modeling  the  effects 
of  central  projection. 

2. 4-1.1  Pinhole  Camera.  One  simple  model  of  centralized  projection 
is  represented  by  a  pinhole  camera.  In  this  model,  all  incoming  light  passes  through 
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the  central  point,  or  optical  center  of  the  camera,  and  is  projected  onto  a  focal  plane 
positioned  at  a  distance  of  one  focal  length  /  behind  the  center  of  projection.  As  shown 
in  Figure  fITX  the  projection  onto  the  focal  plane  is  inverted  by  the  imaging  sensor’s 
optics.  The  pinhole  model  can  be  further  simplified  to  eliminate  the  inversion  by 
positioning  a  virtual  image  plane  one  focal  length  in  front  of  the  center  of  projection. 
This  modified  pinhole  camera  model  is  shown  in  Figure  12771 


Figure  2.7:  Modified  pinhole  camera.  Light  reflected  from  the  subject  crosses  the 
virtual  image  plane  one  focal  length  in  front  of  the  center  of  projection. 


A  point  S  in  the  world  can  be  identified  by  a  position  vector  sc  originating  at 
the  camera  center  of  projection  and  terminating  at  S.  The  vector  spro ■  that  is  co- 
clirectional  with  sc  but  terminates  at  the  intersection  with  the  image  plane  is  a  scalar 
multiple  of  sc.  Because  the  vector  spro ■  terminates  at  the  image  plane,  its  z-coordinate 
must  be  equal  to  /.  Using  the  method  of  similar  triangles,  we  can  then  determine 
that  the  scaling  factor  relating  sc  to  sproj  is  f/s'i,  as  shown  in  Equation  (j2. 14j) 


s 


C 

proj 


(2.14) 


Since  the  image  frame  has  only  two  dimensions,  the  transformation  from  camera 
coordinates  to  image  coordinates  must  discard  the  z-component  of  the  vector  spro ., 
reducing  it  to  a  two-vector.  The  2-D  representation  of  spro ■  is  its  projection  onto 
the  image  plane,  spr<h,  which  originates  from  the  projection  of  the  camera  frame’s 
origin  onto  the  image  plane,  as  shown  in  Figure  12.81  This  reduction  is  performed 
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mathematically  by 


sproj  _ 


1  0  0 
0  1  0 


'proj 


(2.15) 


Now  that  s  has  been  expressed  in  a  2-D  frame  that  is  coplanar  with  the  image  frame, 
the  methods  discussed  in  Section  12.2.11  can  be  applied  to  complete  the  transformation 
into  the  image  frame. 


Figure  2.8:  Image  plane.  The  image  plane  has  physical  dimensions  of  H  x  W  and 
pixel  dimensions  of  M  x  N.  The  camera  frame  x  and  y- axes  project  onto  the  image 
plane  centered  at  the  coordinates  ^rr-).  (Figure  taken  from  [31]) 


There  is  a  relative  rotation  between  the  image  frame  and  projected  camera 
frame  which  can  be  captured  by  a  rotation  matrix.  Multiplying  the  ^-coordinate  by 


-1  will  rotate  a  vector  from  the  projected  camera  frame  to  the  image  frame  as  shown 


in  Equation  (j2.16|). 
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(2.16) 
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There  is  also  a  relative  scaling  factor  in  each  direction.  The  focal  plane  has  a 
vertical  dimension  of  H  in  the  camera  frame  and  a  vertical  dimension  of  M  in  the 
image  frame.  This  leads  to  a  scaling  factor  of  M/H  in  the  x-direction.  Similarly  the 
focal  plane  has  a  horizontal  dimension  of  W  in  the  camera  frame  and  a  horizontal 
dimension  of  N  in  the  image  frame.  This  leads  to  a  scale  factor  of  N /W  in  the  y- 
direction.  These  scale  factors  appear  on  the  diagonal  of  the  transformation  matrix 
shown  in  Equation  (12.171). 


Lastly,  there  is  a  relative  translation  between  the  image  frame  and  the  projected 
camera  frame.  The  origin  of  the  camera  frame  projects  onto  the  focal  plane  at  the 
coordinates  in  the  image  frame,  which  give  the  relative  translation  between 

the  two  frames.  This  projection  of  the  optical  center  onto  the  focal  plane  is  known 
as  the  principal  point.  The  transformation  of  scpro ■  into  the  image  frame  is  now  given 
by  Equation  (12.171) 
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Substituting  the  right  hand  side  of  Equation  (j2. 14[)  into  Equation 
homogeneous  coordinates  gives  the  total  transformation  from  the 
the  image  frame. 


(2.17) 

(I2.17P  and  using 
camera  frame  to 


The  transformation  matrix,  Tjf"x ,  is  given  by: 


(2.18) 
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(2.19) 


and  is  known  as  the  intrinsic  camera  matrix. 


Since  an  imaging  sensor  is  to  be  used  to  aid  in  determining  attitude,  an  inverse 
transformation  is  also  needed.  This  inverse  transformation  converts  a  pair  of  pixel 
coordinates  in  the  image  frame  into  a  vector  pointing  from  the  center  of  the  camera  to 
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that  pixel  in  the  camera  frame.  The  loss  of  dimension  that  occurs  when  an  image  is 
produced  prevents  the  vector  sc  from  being  fully  determined  by  the  inverse  transfor¬ 
mation.  Instead,  pre-multiplying  Equation  (|2.18j)  by  the  inverse  of  the  transformation 
matrix  will  only  yield  the  homogeneous  three-vector  sc  that  is  co-directional  with  sc 
as  shown  in  Equation  (12.2011. 


s 


C 


rpC  pix 
pix_ 


(2.20) 


rjiC  _  rrjipiXl—  1 

pix  —  L  c  \ 


H  n  H(M+ 1) 

fM  U  2/M 

n  w_  W(N+ 1) 
U  fN  2  fN 


0  0  1 


(2.21) 


2-4-2  Camera  Calibration.  Beyond  the  simple  pinhole  camera  model  just 
described,  other  nonlinear  distortions  are  present  in  any  imaging  system.  These  dis¬ 
tortions  give  rise  to  the  need  of  a  calibration  procedure  which  compensates  for  their 
effects.  Such  a  procedure  begins  by  modeling  the  distortion. 


2-4-2. 1  Radial  Distortion.  The  most  visible  form  of  distortion  is  radial 
distortion  which  causes  the  projections  of  straight  lines  to  appear  curved  in  images. 
This  distortion  occurs  when  the  sensor’s  optics  non-uniformly  magnify  the  image. 
In  [4],  Brown  models  this  distortion  as  a  power  series  of  r,  the  Euclidean  length  of 
sproj ,  as  shown  in  Equation  (12.221).  where  dCadl  is  the  radial  distortion  factor  and  fcj 
are  constant  coefficients. 


d(rad)  =  (i  +  klr2  +  k2r4  +  k3r6)  (2.22) 

r2  =  (gproi)  2  +  (sprojy  (2.23) 

2. 4-2. 2  Tangential  Distortion.  Tangential  distortion  causes  the  princi¬ 
pal  point  c  to  be  positioned  away  from  the  geometric  center  of  the  image  plane.  This 
distortion  arises  from  1)  imperfections  in  lens  manufacture  that  cause  the  centers 
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of  curvature  of  the  front  and  back  surfaces  to  not  be  collincar  and  2)  misalignment 
between  the  imaging  sensor’s  optics  and  photosensitive  array.  Brown  models  this  dis¬ 
tortion  as  the  vector  function  of  r,  sPx° J  and  Syroj  shown  in  Equation  (12.241),  where 
c?(tan)  is  the  tangential  distortion  vector  and  pt  are  constant  coefficients. 


' 2Pl(sr’)(sr’)  +P2  [r2  +  2(arOJ)2]\ 
,pi  [r2 + 2(Sro2 + 2P2(sro(Srj)]j 


2. 4- 2. 3  Skew  Factor.  The  skew  factor,  ac ,  refers  to  the  orthogonality 
of  the  pixel  array’s  x  and  y- axes  and  accounts  for  the  possibility  that  the  imaging 
array  is  non-rectangular.  For  most  cameras,  the  skew  factor  is  nearly  zero.  The 
greater  the  skew  factor,  the  further  from  90°the  angle  between  the  image  frame’s  x 
and  y- axes  is.  When  present,  the  skew  factor  appears  in  the  upper  middle  position  of 
the  matrix  Tpix  as  shown  in  Equation  (12.251). 
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2.4.24  Distorting  Camera  Model.  Combining  all  of  the  effects  dis¬ 
cussed  in  the  previous  three  sections  yields  a  camera  model  that  distorts  the  projec¬ 
tion  of  sc  onto  the  image  plane  and  then  converts  the  distorted  projection  into  pixel 
coordinates.  This  complete  model  is  shown  in  Equation  (12.261). 
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Unfortunately,  an  inverse  map  to  get  from  distorted  pixel  coordinates  to  a  camera- 
frame  line  of  sight  vector  cannot  be  determined  in  closed  form.  “Because  of  the 
high  degree  distortion  model,  there  exists  no  general  algebraic  expression  for  this  in- 
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verse  map”  [3] .  Instead,  the  distortion  removal  is  performed  using  iterative  numerical 
methods. 


2. 4- 2. 5  Calibration.  With  a  distortion  model  defined,  the  camera  can 
be  calibrated  to  determine  the  distortion  parameters.  These  parameters  include  the 
radial  distortion  coefficients,  /q,  the  tangential  distortion  coefficients,  pj ,  the  pixel 
coordinates  of  the  principal  point,  (c^x,  c%x)  and  the  skew  factor,  ac.  They  are  deter¬ 
mined  by  photographing  a  subject  containing  features  with  known  coordinates,  such 
as  the  calibration  board  shown  in  Figure  12.91  in  varying  orientations  and  observing 
the  difference  between  the  actual  projections  of  the  features  onto  the  image  plane 
from  those  predicted  by  the  pinhole  camera  model.  The  calibration  is  only  valid  for 
a  particular  focal  length  and  zoom  setting.  Bouguet’s  Camera  Calibration  Toolbox 
for  Matlab  is  one  tool  that  can  be  used  to  determine  the  distortion  parameters  [3]. 
Besides  the  distortion  parameters  already  mentioned,  the  toolbox  also  provides  values 
for  the  focal  length  measured  in  both  vertical  and  horizontal  pixels  (in  case  the  pixels 
are  not  square)  which  appear  in  the  upper  left  and  center  positions,  respectively,  of 
the  matrix  Tpclx . 

2.5  Digital  Image  Processing 

Now  that  a  camera  model  has  been  developed  which  describes  how  images  are 
produced,  techniques  used  to  process  the  data  the  images  provide  will  be  presented. 
Methods  of  edge  and  line  detection  are  presented,  followed  by  their  application  to¬ 
wards  the  detection  of  vanishing  points.  Lastly,  methods  of  determining  camera  atti¬ 
tude  relative  to  a  scene  based  on  the  projections  of  vanishing  points  onto  the  image 
plane  are  presented. 

2.5.1  Edge  Detection.  One  common  preprocessing  step  in  analyzing  digital 
images  is  to  find  edges,  i.e. ,  points  where  the  magnitude  of  the  gradient  is  high  in 
one  direction  as  compared  with  the  rest  of  the  image.  Pixels  where  this  gradient 
magnitude  is  above  a  particular  threshold  are  identified  as  edges,  or  edgels. 
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Figure  2.9:  Calibration  image.  A  calibration  board  such  as  this  is  used  to  determine 
a  camera’s  distortion  parameters.  The  effects  of  radial  distortion  are  apparent  in  the 
upper  right  corner  where  the  rightmost  column  of  squares  appears  to  curve  inward. 

Rather  than  actually  computing  a  derivative  of  the  image,  often  the  gradient 
is  approximated  by  evaluating  the  convolution  of  the  image  with  a  small  kernel  or 
“mask”.  Common  convolution  kernels  used  for  this  task  include  those  proposed  by 
Roberts  ra.  Prewitt  m,  and  Sobel  [28]  shown  in  Figure  12.101  Convolution  of  the 
digital  image  with  the  first  mask  of  each  pair  produces  the  gradient  in  the  vertical 
direction,  Gy,  and  convolution  with  the  second  mask  produces  the  gradient  in  the 
horizontal  direction,  Gh- 
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1 

0 

-1 

1 

0 

(a)  Roberts  masks 


Figure  2.10:  Common  gradient  operators.  Convolution  kernels  such  as  these  are 

commonly  used  to  approximate  the  derivative  of  digital  images.  The  2s  in  the  center 
column  and  row  of  the  Sobel  kernels  provide  a  smoothing  effect  which  is  helpful  in 
suppressing  noise. 
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As  Gonzalez  and  Woods  [TO]  observed,  2x2  masks  are  simple  computationally, 
but  the  symmetry  about  a  center  point  offered  by  odd-dimensioned  masks  is  more 
useful  for  determining  edge  directions.  Larger  masks  provide  more  accurate  approx¬ 
imations  of  the  derivative,  since  they  incorporate  more  information  into  each  calcu¬ 
lation.  However,  larger  masks  also  require  many  more  computations,  since  for  a  pair 
of  NxN  masks,  2N2  products  and  2(N2  -  1)  sums  must  be  performed  for  each  pixel. 

The  true  gradient  magnitude  is  determined  by  evaluating  the  Euclidean  norm 
of  the  vertical  and  horizontal  gradients  for  each  pixel,  but  the  less  computationally 
expensive  method  of  simply  adding  their  absolute  values  as  shown  in  Equation  (12.2711 
is  commonly  used  when  constrained  by  data  processing  capacity. 

l|G(*,j)IHGv(i,j)MGa(*,i)l  (2-27) 

In  [6j,  Canny  proposed  that  strong  and  weak  edges  be  determined  by  establishing 
both  upper  and  lower  gradient  thresholds.  Strong  edges  occur  where  the  magnitude  of 
the  image  gradient  is  above  the  upper  threshold.  Weak  edges  occur  where  the  gradient 
is  between  the  upper  and  lower  thresholds.  Only  weak  edges  which  are  adjacent  to 
strong  edges  are  declared  as  edgels  in  the  final  edge  image. 

Regardless  of  the  method  used,  the  end  result  of  an  edge  detection  operation 
is  a  binary  image,  where  ones  represent  pixels  that  are  declared  as  edges  and  zeros 
represent  all  other  pixels.  Figure  [2JJ]  shows  the  result  of  performing  the  Canny  edge 
detection  operation  on  an  image  of  a  hallway. 

2.5.2  Line  Detection.  The  problem  of  identifying  straight  lines  in  digital 
images  has  been  investigated  by  many  researchers  and  a  plenitude  of  methods  have 
been  developed.  However,  most  methods  are  at  least  loosely  based  on  either  the 
Hough  transform  from  P2j  or  Burns’  line  extractor  from  0. 
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(a)  Hallway  image  (b)  Canny  edge  image 


Figure  2.11:  Canny  edge  image,  (a)  An  image  of  a  hallway,  (b)  This  binary  image 
results  from  performing  Canny  edge  detection  on  the  image  shown  in  (a). 

2.5.2. 1  Hough  Transform  Methods.  The  Hough  transform  was  devel¬ 
oped  and  patented  by  Paul  Hough  in  1962  as  a  means  for  identifying  complex  patterns. 
The  method  involves  establishing  an  n-parameter  representation  of  the  pattern  be¬ 
ing  sought,  and  then  generating  an  n-dimensional  accumulator  space  for  determining 
how  many  observations  support  the  presence  of  the  pattern.  The  accumulator  space 
is  essentially  a  histogram,  in  which  peaks  appear  that  support  likely  instances  of  the 
pattern  being  sought. 

In  order  to  use  the  Hough  transform  to  find  lines  in  digital  images,  an  appropri¬ 
ate  parameterization  of  a  line  must  be  selected.  Lines  in  two-dimensional  image  space 
have  only  two  degrees  of  freedom,  so  only  two  parameters  are  required  to  describe 
them.  Often,  lines  are  represented  by  a  slope,  m,  and  ^-intercept,  b ,  as  shown  in 
Equation  (I2. 28jl . 

y  =  mx  +  b ;  (2.28) 

However,  this  parameterization  presents  difficulties  when  the  Hough  transform  is  ap¬ 
plied,  because  the  slope  parameter  is  unbounded,  as  manifest  by  the  infinite  slope  of 
vertical  lines. 
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Duda  and  Hart  proposed  the  polar  representation  of  a  line  given  in  Equa¬ 


tion  (12. 29[) .  which  provides  a  fully  bounded  parameter  space  (8j. 

p  —  x  cos  0  +  y  sin  9  (2.29) 

The  parameters  p  and  9  represent  the  length  and  angular  distance  from  the  x-axis 
of  the  shortest  line  segment  joining  the  origin  of  a  digital  image  to  the  line  being 
observed,  as  shown  in  Figure  12.121  The  parameter,  9 ,  can  vary  between  ±90°,  and 


Figure  2.12:  Hough  line  parameters.  The  parameters  p  and  9  represent  the  length 
and  angular  distance  from  the  x-axis  of  the  shortest  line  segment  joining  the  origin  of 
a  digital  image  to  the  line  being  observed. 

the  parameter,  p,  can  vary  between  ±d,  where  d  is  the  diagonal  of  the  image  frame  in 
pixels. 

An  edgel  with  coordinates  ( x ,  y )  supports  the  presence  of  every  line  whose  pa¬ 
rameters  satisfy  Equation  (I2.29p.  Thus,  points  in  the  image  are  represented  by  si¬ 
nusoidal  curves  in  the  Hough  parameter  space,  as  shown  in  Figure  12.131  Conversely, 
points  in  Hough  parameter  space  represent  lines  in  the  image.  When  many  collinear 
edgels  are  present  in  an  image,  their  representations  in  Hough  parameter  space  stack 
on  top  of  one  another  in  the  accumulator.  Peaks  in  the  accumulator  correspond  to 
pairs  of  parameter  values  representing  lines  in  the  original  image,  as  shown  in  Fig¬ 
ure  [2TT41 
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(a)  Subset  of  lines  supported  by  an  edgel  (b)  Hough  parameter  space  for  a  single  edgel 

Figure  2.13:  Hough  transform  of  a  single  edgel.  (a)  A  small  subset  of  the  lines 

supported  by  a  single  edgel  are  displayed,  (b)  The  edgel  shown  in  (a)  is  represented 
by  the  curve  shown  here  in  Hough  parameter  space. 


(a)  Collinear  edgels  (b)  Hough  parameter  space  of  multiple 

edgels 


Figure  2.14:  Hough  transform  of  multiple  collinear  edgels.  (a)  All  edgels  shown 

here  are  collinear.  (b)  The  edgels  shown  in  (a)  are  represented  by  these  curves  in 
Hough  parameter  space.  The  individual  curves  all  overlap  at  the  point  corresponding 
to  the  parameters  of  the  line  passing  through  the  edgels  in  (a). 
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Several  variations  of  the  Hough  transform  have  been  developed,  including  the 
probabilistic  Hough  transform  presented  in  [18].  Kiryati  et  al.’s  method  involves 
selecting  a  random  subset  of  the  collection  of  edgels  in  the  image  and  performing  the 
Hough  transform  on  only  those  points.  Since  strong  instances  of  a  particular  pattern 
in  an  image  will  be  represented  by  large  peaks  in  the  Hough  parameter  space,  the 
smaller  subset  will  still  identify  these  instances  with  high  likelihood.  This  sampling 
method  reduces  the  number  of  computations  required  without  significantly  impacting 
the  performance  in  terms  of  pattern  detection. 

2. 5. 2. 2  Gradient  Orientation  and  Connected  Component  Methods. 

In  [5],  Burns,  Hanson  and  Riseman  demonstrate  that  edge  orientation  carries  impor¬ 
tant  information  about  the  presence  of  lines,  and  can  be  exploited  for  line  identifica¬ 
tion.  They  observed  that  lines  are  characterized  by  neighboring  pixels  whose  gradient 
orientations  are  roughly  orthogonal  to  the  line.  The  Burns  line  extractor  uses  this 
characteristic  to  fit  lines  to  regions  of  neighboring  pixels  with  similar  gradient  direc¬ 
tions.  All  gradient  orientation  line  detection  algorithms  follow  the  following  basic 
steps: 

1.  Group  pixels  by  common  gradient  direction 

2.  Find  collections  of  neighboring  pixels  within  each  grouping.  These  are  called 
“line  support  regions.” 

3.  Fit  a  line  to  each  line  support  region. 

Burns  et  al.’s  method  begins  by  using  convolution  kernels  such  as  those  discussed 
in  Section  f2.5. II to  calculate  the  image  gradient.  The  direction  of  the  gradient  at  each 
pixel  is  determined  by  Equation  (I2.30p  with  the  output  of  the  arctangent  function 
corrected  by  quadrant. 

G 

Gj  =  tan-1  — ^  (2.30) 

CHj 

Once  a  gradient  direction  is  determined  for  each  pixel,  the  pixels  are  sorted  by  gra¬ 
dient  direction  into  groups  representing  coarsely  partitioned  regions  of  the  interval 
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[0, 27r)  as  shown  in  Figure  12.151  The  number  of  divisions  and  boundary  locations 
are  determined  arbitrarily,  with  some  researchers  recommending  overlapping  regions 
to  prevent  fragmentation  of  lines  whose  orientations  may  coincide  with  the  division 
boundaries.  Next,  a  connected  components  algorithm  is  used  to  find  groupings  of 

5  7i  3  n 


Figure  2.15:  Circle  divisions.  Pixels  are  grouped  by  similar  gradient  orientation 

into  one  of  the  regions  shown. 

pixels  with  similar  gradient  orientation  that  are  also  neighbors.  Each  collection  of 
connected  pixels  with  similar  gradient  orientation  is  declared  a  line  support  region 
from  which  a  straight  line  may  be  extracted. 

Researchers  have  demonstrated  different  methods  for  fitting  lines  to  line  support 
regions.  Burns  et  al.  fit  a  plane  to  the  intensity  surface  underlying  each  line  support 
region  [5].  They  then  determined  lines  by  intersecting  the  intensity  surface  plane  with 
a  horizontal  plane  at  the  mean  region  intensity  value.  This  method  gives  good  results 
in  terms  of  identifying  straight  lines  in  images  and  also  facilitates  the  computation 
of  various  line  characteristics  such  as  length,  contrast,  width,  location,  orientation 
and  straightness.  Others  have  modified  the  method  to  enable  faster  image  processing 
when  computational  limitations  are  encountered  [IT]. 
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In  [H] ,  Kahn,  Kitchen  and  Riseman  developed  a  line  extraction  algorithm  they 
call  the  “fast  line  finder”  (FLF)  which  fits  lines  to  line  support  regions  by  finding 
the  major  axis  of  an  ellipse  fit  to  the  region.  Their  method  was  motivated  by  a 
need  to  guide  a  land-based  robot  along  a  path  using  visual  cues.  It  is  intended  to 
be  implemented  on-board  the  mobile  platform  to  primarily  identify  the  edges  of  the 
path  the  robot  is  to  follow. 

For  a  line  support  region,  composed  of  N  individual  pixels,  a  scatter  matrix, 
A,:,  is  determined  by  Equation  (j2.3ip 


A*  = 


bi 


bt 

Ci 


(2.31) 


where  the  matrix  elements 

cii,  bi  and 

Ci  are  given  by  Equation  (J2., 
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The  coordinates  ( Xj,yj )  give  the  position  of  each  member  of  the  line  support  region, 
and  each  weighting  factor,  w3 ,  is  equal  to  the  gradient  magnitude  of  the  j-th  pixel. 
Kahn  et  al.  opted  for  the  standard  alternative  form  for  determining  the  scatter  matrix 
shown  here  because  it  does  not  require  the  independent  calculation  of  a  region  centroid 
prior  to  accumulating  the  partial  sums. 


The  scatter  matrix  has  two  eigenvalues,  A l  and  A with  their  corresponding 
eigenvectors,  and  e^.  One  of  the  eigenvalues,  A £,,  is  most  likely  much  larger  than 
the  other.  In  fact,  the  ratio  of  Xs/Xl  can  be  used  as  a  metric  to  describe  how  line- like 
the  line  support  region  is.  The  smaller  the  ratio,  the  longer  and  narrower  the  ellipse 
that  is  fit  to  the  region. 
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The  eigenvectors  of  the  scatter  matrix  give  the  directions  of  the  major  and  minor 
axes  for  an  ellipse  fit  to  the  line  support  region,  with  describing  the  major  axis. 
The  line,  fit  to  line  support  region,  f?*,  is  fully  described  by  the  vector  passing 
through  the  centroid  of  the  line  support  region  whose  orientation  is  described  by 
Equation  (I2.33j)  with  the  arctangent  function  corrected  by  quadrant. 

9i  -  tan^1  ^  (2.33) 

Endpoints  of  the  line  are  determined  by  finding  the  intersections  of  lt  with  the  bound¬ 
aries  of  the  line  support  region. 

In  pT9] ,  Kosecka  and  Zhang  modified  the  line- fitting  process  further  by  omitting 
the  weighting  factors  used  in  computing  the  scatter  matrix.  Their  method  more  closely 
resembles  the  standard  form  for  calculating  a  covariance  matrix  from  the  coordinates 
of  the  pixels  comprising  each  line  support  region.  First,  a  mean  coordinate  pair, 
(Xi,yi)  is  calculated  from  the  coordinates  of  each  pixel  in  the  line  support  region. 
Then,  the  elements  of  the  scatter  matrix  are  determined  using  Equation  (I2.34j). 

N 

ai  =  Y,(X3  ~ 
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bi  =  Y,(x3-2i)(yj-yi)  (2-34) 
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3= 1 

Again,  the  eigenvalues  and  eigenvectors  of  the  scatter  matrix 
the  orientation  of  /,,  but  Kosecka  and  Zhang’s  method  uses 
Equation  (I2.35j)  corrected  by  quadrant. 

6i  =  tan-1  — —  (2.35) 

eSy 


are  used  to  determine 
the  relation  shown  in 
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Each  line  is  then  described  using  the  p  -  6  parameterization  from  Section  12.5.2.11  by 
using  the  coordinates  of  the  centroid  of  each  line  support  region,  ( Xi,yi )  and  the 
orientation,  dj  as  the  inputs  to  Equation  (I2.29p. 

2. 5. 2. 3  Method  Comparison.  In  m,  Kessler  et  al.  compare  the 
speed  of  various  line  detection  methods.  Each  of  four  different  methods  including 
the  Hough  transform  discussed  in  Section  12.5.2. II and  Kosecka  and  Zhang’s  connected 
components  algorithm  discussed  in  Section  12.5.2.21  are  implemented  on  a  2.5  GHz 
computer  using  Matlab.  Their  results  show  that  Kosecka  and  Zhang’s  method  of 
line  detection  is  35%  faster  than  the  Hough  transform  when  processing  a  1024x768 
resolution  image  and  40%  faster  when  processing  a  512x384  resolution  image. 


2. 5. 2. 4  Representing  Image  Lines  in  3-Space.  In  [1],  Barnard  describes 
how  every  line  in  an  image  can  be  imagined  to  represent  a  plane  in  3-space  which  is 
defined  by  any  two  points  on  the  line  and  the  optical  center  of  the  camera.  This  plane, 
called  an  interpretation  plane,  can  be  described  mathematically  by  a  unit  normal 
vector  li  as  shown  in  Figure  12.161  The  normal  vector  is  determined  by  calculating 
the  normalized  cross  product  of  the  vectors  pointing  to  the  two  points  on  the  line  as 
shown  in  Equation  (12.361). 


lc  = 

li 


s,  x  st 


IS'j  X  s(,| 


(2.36) 


Figure  2.16:  Planar  representation  of  an  image  line.  Lines  in  images  can  be  de¬ 

scribed  by  unit  normals  of  planes  in  3-space. 
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Figure  2.17:  Intersection  of  parallel  image  lines.  The  intersection  of  any  pair  of 

lines  in  an  image  is  expressed  as  the  cross  product  of  their  planar  normals. 

This  planar  representation  of  image  lines  gives  rise  to  the  following  three  axioms: 


1.  The  line  l  joining  two  points  px  and  p2  is  given  by: 

=  px  xp2 

l|Pl  XP2II 

2.  The  point  p  where  two  lines  l\  and  1 2  cross  is  given  by: 

Zi  x  l2 


(2.37) 


(2.38) 


3.  Any  point  p  on  line  l  must  satisfy: 


pTl  =  0 


(2.39) 


It  is  worth  noting  that  the  relationship  shown  in  item  [2]  is  true  of  any  pair  of  lines 
in  the  image,  including  those  that  are  parallel  on  the  image  plane.  Such  a  case  is 
illustrated  in  Figure  12.171  Item  |3]  provides  a  useful  metric  for  determining  how  close 
a  point  is  to  a  particular  line.  The  greater  the  magnitude  of  their  inner  product,  the 
further  p  is  from  l. 
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2.5.3  Vanishing  Points.  The  projections  of  lines  which  are  parallel  in  the 
world  appear  to  converge  at  a  fixed  point  in  perspective  images.  This  phenomenon 
is  particularly  visible  in  photographs  of  long  corridors  such  as  the  one  shown  in  Fig¬ 
ure  12.11(a).  The  point  at  which  parallel  lines  appear  to  intersect  is  known  as  a 
vanishing  point,  and  represents  the  projection  of  a  point  infinitely  distant  from  the 
camera  onto  the  image  plane.  The  projection  of  a  vanishing  point  onto  the  image 
plane  is  invariant  with  relative  translation,  and  is  only  affected  by  relative  rotation 
between  the  camera  and  scene  inj.  This  property  makes  vanishing  point  detection 
and  tracking  an  effective  means  of  determining  the  camera’s  attitude  with  respect  to 
a  scene  containing  many  parallel  lines. 


2.5.3. 1  Vanishing  Point  Detection.  A  vanishing  point  (VP)  is  the 
point  at  which  the  projections  of  parallel  lines  in  a  structured  scene  appear  to  con¬ 
verge.  Thus,  the  detection  of  vanishing  points  in  images  of  structured  environments 
is  reduced  to  finding  points  where  many  lines  intersect.  To  accomplish  this  task, 
Barnard  proposes  a  method  similar  to  the  Hough  transform.  First  an  accumulator 
representing  the  Gaussian  sphere,  i.e.,  a  sphere  centered  at  the  camera’s  optical  cen¬ 
ter  with  radius  equal  to  one,  is  constructed.  The  sphere  is  parameterized  by  azimuth 
angle,  a  ranging  from  0  to  2n  radians,  and  elevation  angle,  /3,  ranging  from  -|  to  | 
radians.  Each  element  of  the  array  represents  a  particular  range  of  azimuth  and  ele¬ 
vation  angles.  Because  these  elements  represent  uniform  angular  spacing,  they  do  not 
represent  equal  portions  of  the  sphere’s  surface.  Every  line  in  the  image  is  projected 
onto  the  Gaussian  sphere  by  intersecting  its  interpretation  plane  with  the  sphere. 
These  intersections  form  great  circles  on  the  sphere’s  surface.  The  accumulator  is 
populated  by  incrementing  the  elements  which  satisfy  Equation  (|2.40jh  (For  cases 
where  ly  is  very  small,  an  alternate  form  exists  expressing  a  as  a  function  of  (3  with 
lx  in  the  denominator.) 


j3  =  tan  1 


-L  sin  a  -  L  cos  a: 


(2.40) 
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Peaks  in  the  accumulator  give  the  azimuth  and  elevation  of  points  where  many  lines 
intersect,  which  are  likely  vanishing  points.  Limitations  of  this  method  include  the 
uneven  spacing  of  accumulator  elements  on  the  sphere’s  surface  and  the  ambiguous 
azimuth  angle  of  vectors  pointing  to  either  pole. 

In  [2D],  Magee  and  Aggarwal  use  a  similar  approach,  but  rather  than  tracing 
circles  in  a  discretization  of  the  Gaussian  sphere,  they  calculate  an  (a,  /3)  pair  for  the 
intersection  of  each  possible  pairing  of  image  lines  using  Equations  (12.4ip  and  (I2.42p. 


a  =  tan-1  (  — ) 

\Px) 


(2.41) 


and 

P  =  tarr1  (  Vz  |  (2. 

\  V  Px  +Py) 

Then,  groupings  of  intersections  with  an  arc  distance  between  them  that  is  below  a 
threshold  are  formed.  If  a  large  enough  group  is  found,  the  associated  (a,/3)  pair  is 
declared  a  vanishing  point,  the  lines  associated  with  that  grouping  are  removed  from 
the  set  of  image  lines,  and  the  process  repeats.  This  method  overcomes  the  problem 
stemming  from  the  discretization  of  the  sphere  into  unevenly  sized  segments,  but  still 
is  subject  to  ambiguous  azimuth  at  the  poles.  Both  Barnard’s  and  Magee  and  Aggar- 
wal’s  methods  also  are  somewhat  computationally  burdensome  with  the  requirement 
to  either  trace  circles  on  the  sphere  for  every  line  or  compute  intersections. 


2. 5. 3. 2  Random  Sample  Consensus.  In  [Dj,  Fischler  and  Bollcs  intro¬ 
duce  the  Random  Sample  Consensus  (RANSAC)  algorithm  as  part  of  their  method  for 
determining  the  position  of  a  camera  based  on  an  image  of  landmarks  with  known 
locations.  The  method  provides  a  way  to  robustly  fit  a  model  to  a  set  of  data  con¬ 
taining  a  certain  proportion  of  outliers  using  a  statistically  driven  guess  and  check 
scheme.  When  applied  to  the  task  of  finding  vanishing  points  in  images,  it  enables 
the  detection  of  clusters  of  mutually  intersecting  lines  without  requiring  that  every 
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possible  intersection  be  explicitly  calculated.  The  method  is  described  here  in  general 
terms  and  applied  to  the  detection  of  vanishing  points  in  Chapter  IIIII 

Unlike  conventional  algorithms  for  fitting  a  model  to  experimental  data  that 
can  be  influenced  by  what  Fischler  and  Boles  call  “gross  deviations,”  the  ransac 
algorithm  is  robust  to  such  outlying  data.  “Rather  than  using  as  much  of  the  data 
as  possible  to  obtain  an  initial  solution  and  then  attempting  to  eliminate  the  invalid 
data  points,  ransac  uses  as  small  an  initial  data  set  as  feasible  and  enlarges  this  set 
with  consistent  data  when  possible.”  [9]  The  process  for  establishing  a  model  from  a 
set  i S'  of  datum  points  that  is  known  to  contain  a  proportion  e  of  outliers  follows  the 
basic  steps  outlined  below. 

1.  Randomly  select  a  minimum  subset  s  from  S  and  instantiate  the  model  with  s. 

2.  Determine  the  set  of  points  Si  that  is  within  a  threshold  t  of  the  model  estab¬ 
lished  by  s.  The  set  St  is  the  consensus  set  of  S. 

3.  If  the  size  of  St  is  greater  than  a  threshold  T,  re-estimate  the  model  based  on 
all  the  points  in  Si  and  terminate. 

4.  If  the  size  of  S'*  is  less  than  T  and  fewer  than  N  trials  have  been  performed, 
select  a  new  random  minimum  subset  s  and  repeat  steps  2  through  4. 

5.  After  N  trials,  re-estimate  the  model  with  the  largest  consensus  set  St  and 
terminate. 

There  are  three  unspecified  parameters  used  in  implementing  the  ransac  model 
estimation  algorithm,  specifically  the  threshold  t  for  declaring  data  as  either  inkers  or 
outliers,  the  threshold  T  which  determines  how  many  data  should  fit  the  model  before 
terminating,  and  the  maximum  number  N  of  random  minimum  subsets  to  examine 
before  terminating  the  process. 

The  threshold  value  t  used  for  declaring  data  as  either  inkers  or  outliers  is  often 
determined  empirically.  Alternately,  by  assuming  measurement  errors  are  zero-mean 
with  a  known  standard  deviation,  t  can  be  computed  from  a  x2  distribution.  The  size 
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threshold  T  for  determining  what  is  an  adequately  large  consensus  set  is  determined 
from  the  expected  number  of  outliers  as  in  Equation  (12.431) . 


T=(l-e)S 


(2.43) 


Lastly,  the  minimum  number  of  iterations  N  required  to  be  assured  with  probability 
p  that  at  least  one  minimum  subset  s  is  free  from  outliers  is  determined  by  Equa¬ 


tion  (12.441) 


N  = 


log(l  -p) 


(2.44) 


log(l-  (l-e)s) 

where  e  is  the  proportion  of  outliers  expected  to  be  found  in  S.  Naturally,  p  is 
preferred  to  be  very  nearly  equal  to  1,  with  0.99  frequently  used  in  practice. 


In  many  practical  applications,  e  may  not  be  known.  Under  these  circumstances, 
the  threshold  T  of  inliers  needed  to  end  the  loop  cannot  be  determined.  Instead,  only 
the  minimum  number  of  iterations  is  used  for  deciding  when  to  terminate  the  process. 
In  order  to  calculate  the  minimum  number  of  iterations  to  perform,  a  worst  case  value 
can  be  used  initially,  and  both  e  and  N  can  be  recomputed  in  subsequent  iterations. 
If  one  random  minimum  subset  s  produces  a  proportion  of  outliers  smaller  than  the 
current  value  of  e,  N  is  recomputed  from  Equation  (12.441)  using  the  new,  smaller  e. 
In  the  case  where  N  is  found  to  be  smaller  than  the  number  of  iterations  that  have 
already  been  performed,  the  algorithm  terminates  and  the  largest  consensus  set  is 
used  to  estimate  the  model. 


2. 5. 3. 3  Determining  Attitude  From  Vanishing  Points.  Various  re¬ 
searchers  have  used  vanishing  points  to  determine  camera  orientation  with  respect  to 
the  scene.  In  m,  Schuster  et  al.  describe  the  use  of  vanishing  points  to  determine 
the  heading  of  a  ground-based  robotic  vehicle.  Their  camera  is  pitched  upward  so  as 
to  view  the  grid  of  rectangular  ceiling  tiles  inside  a  building.  The  desired  direction  of 
travel  is  parallel  to  the  short  axis  of  the  tiles,  so  the  corresponding  vanishing  point 
is  used  to  determine  heading  and  make  corrections  in  a  feedback  controller.  Their 


38 


algorithm  is  limited  in  its  scope,  since  only  one  attitude  angle  is  determined.  Also,  if 
the  image  plane  is  parallel  to  the  vanishing  direction,  a  finite  vanishing  point  cannot 
be  found  using  their  method. 

In  [13]  Johnson  demonstrates  the  use  of  vanishing  points  to  find  the  attitude 
of  a  quadrotor  unmanned  vehicle  within  an  indoor  corridor.  His  method  uses  the 
vanishing  point  along  the  length  of  the  corridor  to  directly  calculate  pitch  6  and  yaw 
-0  with  the  relationships  shown  in  Equations  (I2.45[)  and  (I2.46P 

0  =  tan_1^^j  (2-45) 

6  -  tan-1  j  cos0  (2.46) 

where  7 x  and  7^  are  the  distance  in  the  camera  frame’s  x  and  y  directions,  respectively, 
of  the  vanishing  point  from  the  principal  point  on  the  image  plane.  The  vanishing 
point  in  the  downward  direction  7  is  used  to  calculate  the  roll  angle  as  shown  in 
Equation  (12.4711. 

(f)  -  -  tan-1  f  )  (2.47) 

V  lx  ) 

These  attitude  angles  are  then  combined  with  the  solution  from  an  inertial  sensor  in  a 
Kalman  filter.  The  locations  of  the  vanishing  points  in  one  image  are  used  as  starting 
points  to  look  for  vanishing  points  in  the  next  image.  Though  a  Kalman  filter  is  used 
to  combine  the  state  estimates,  the  inertial  data  are  not  used  to  aid  the  vision  routine. 

In  ei,  Borkowski  and  Veth  illustrate  the  benefits  of  using  inertial  data  to  predict 
where  the  vanishing  point  will  appear  in  the  Hough  accumulator  space  and  windowing 
about  that  point.  The  windowed  Hough  space  is  deemed  the  “predictive  Hough  space” 
and  peaks  corresponding  to  lines  supporting  the  vanishing  point  are  sought  within 
it.  This  method  is  compared  with  inertial-only  and  non-predictive  Hough  transform 
methods  and  shown  to  be  superior  in  terms  of  reducing  attitude  errors. 
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2.6  Kalman  Filtering 


With  inertial  and  imaging  methods  now  described,  a  method  for  merging  mea¬ 
surements  from  both  types  of  sensor  is  required.  This  combination  is  performed  inside 
of  a  Kalman  filter. 

In  1960,  Rudolf  Kalman  published  his  method  for  linear  estimation  in  the  Jour¬ 
nal  of  Basic  Engineering  HE  This  method,  which  has  come  to  be  known  as  the 
Kalman  filter,  uses  Bayesian  statistics  to  optimally  combine  a  dynamics  model  and 
sensor  measurements  to  produce  a  minimal-uncertainty  estimate  of  quantities  of  inter¬ 
est.  An  outline  of  Kalman’s  method  follows,  based  primarily  on  Dr.  Peter  Maybeck’s 
presentation  of  the  topic  in  [21]  and  [22] .  Though  an  equivalent  continuous-time  algo¬ 
rithm  also  exists,  the  Kalman  filter  is  presented  here  as  a  discrete-time  method,  since 
it  will  ultimately  be  implemented  on  a  digital  computer. 

2. 6. 1  Linear  Kalman  Filter.  Some  physical  systems  are  adequately  modeled 
in  linear  stochastic  differential  equation  form  as: 


x{t)  =  Fx(t )  +  Bu{t )  +  Gw(t ) 


(2.48) 


where  a?  is  a  vector  of  state  variables,  u  is  a  vector  of  control  inputs,  and  w  is  a  vector 
of  zero-mean,  white,  Gaussian  noise  sources  with  autocorrelation: 


E  [w(t)w(f  +  r)]  =  QS(r) 


(2.49) 


The  matrices  F,  B:  and  G  are  populated  with  constant  coefficients,  and  5(r)  is  the 
Dirac  delta  function.  Sensor  measurements  for  such  a  system  may  also  be  modeled 


by: 


z{ti)  =  Hx(ti )  +  v(U) 


(2.50) 
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where  z  is  a  vector  of  measurements,  H  is  a  matrix  of  constant  coefficients,  and  v  is 
a  vector  of  zero-mean,  white,  Gaussian  noise  sources  with  autocorrelation: 

E  [v(U)v(tj)]  =  RSij  (2.51) 

The  symbol  8ij  represents  Kronecker’s  delta  function.  For  such  a  linear  system,  the 
standard  Kalman  filter  is  the  optimal  algorithm  for  estimating  the  random  state 
vector,  x ,  in  terms  of  minimizing  uncertainty  in  a  least  squares  sense. 

The  Kalman  filter  provides  a  Gaussian  probability  density  function  (pdf)  for 
the  random  vector,  x,  at  each  instant  in  discrete  time  conditioned  on  the  uncertain 
measurements  provided  by  the  sensors.  Beginning  with  initial  conditions,  the  state 
estimate,  x ,  and  covariance,  Pxx ,  are  propagated  from  one  instant  in  discrete  time  to 
the  next  using  a  state  transition  matrix,  <E»,  which  is  determined  by: 

-  eFAt  (2.52) 

where  At  is  the  time  step  between  discrete  time  instants.  The  propagation  is  given 
by: 

x{t~i+l)  =  ®x(t+)  +  Bu(U)  (2.53) 

Pxxfc i)  -  <S>Pxx(t+)<f>T  +  Qd  (2.54) 

where  Qd  is  the  discrete-time  process  noise  strength  matrix.  The  calculation  of  Qd  is 
not  as  simple  as  determining  <f>,  but  can  be  accomplished  using  the  process  proposed 
by  Van  Loan  in  [50]. 

At  discrete  instants  in  which  measurements  are  available,  these  measurements 
are  used  to  update  the  state  estimate  and  covariance  by  optimally  combining  the 
dynamics  model  estimate  and  uncertain  measurements  through  the  use  of  the  Kalman 
gain  matrix,  K ,  which  is  given  by: 


K(U)  =  Pxx{t-)HT  [HPxx(t;)HT  +  Rl1  (2.55) 
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The  dynamics  model  prediction  and  measurements  are  combined  to  produce  a 
new  state  estimate  and  covariance  using  the  following  relationships: 

x(tt)  =  x(tj)  +  K(ti)[z(ti)  -  Hx(t;)]  (2.56) 

Pxx(tt)  =  Pxx(t~)  -  K{ti)HPxx(tl)  (2.57) 

The  quantity  z(ti)-Hx(t^)  which  appears  in  Equation  (12.561)  is  known  as  the  “resid¬ 
ual,”  and  represents  the  difference  between  the  measurement  realization  and  the  mea¬ 
surement  prediction  from  the  dynamics  model.  The  expression  HPxx(t^)HJ  +R  that 
appears  in  the  Kalman  gain  equation  (12.55ft  is  the  residual  covariance.  The  Kalman 
gain  serves  as  a  weighting  factor  to  give  the  appropriate  amount  of  preference  to 
either  the  dynamics  model  prediction  or  the  measurement  based  on  their  respective 
uncertainties. 

2.6.2  Extended  Kalman  Filtering.  For  systems  which  are  not  adequately 
modeled  by  Equation  (12.48  ft.  the  filter  which  has  just  been  described  will  not  optimally 
estimate  the  states  and  their  uncertainties.  In  such  a  case,  an  extended  Kalman 
filter  (EKF)  can  be  used  to  obtain  more  accurate  estimates.  The  basic  stochastic 
differential  equation  form  of  such  a  nonlinear  system  is: 

x{t)  =  f  [x(t),u(t),t]  +  G(t)w(t)  (2.58) 

where  /  is  a  vector  of  functions,  fj,  at  least  one  of  which  is  nonlinear.  The  measure¬ 
ment  equation  for  such  a  system  may  also  be  nonlinear  of  the  form: 


z(U )  =  h[x(ti),U]  +  v(U) 


(2.59) 


where  h  is  also  a  vector  of  functions,  /i^,  at  least  one  of  which  is  nonlinear. 

An  alternative  for  overcoming  the  linear  filter’s  shortfall  is  to  linearize  about 
the  state  estimates  and  define  a  vector  of  perturbation  states.  The  perturbation 
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state  vector,  dx,  represents  the  difference  between  the  true  state  and  estimated  state 
vectors. 

5x(t)  =  x{t)  -  x{t)  (2.60) 

The  EKF  produces  a  nominal  state  trajectory  for  each  measurement  interval  by  start¬ 
ing  from  the  most  recent  state  estimate  and  integrating  the  nonlinear  state  functions 
as  shown  in  Equation  (12.61  j) . 

r~  ^i+i 

x(ti+ 1)  =  /  f[x(t),u(t),t]dt  +  x(tl)  (2.61) 

Jti 


The  estimated  perturbation  state  vector,  5x,  is  set  to  zero  at  the  beginning  of  a 
filter  cycle,  updated  during  the  measurement  update  phase,  added  to  the  nominal 
trajectory  to  correct  the  total  state  estimate,  and  reset  to  zero  before  beginning  the 
next  recursion. 

The  state  transition  matrix  used  to  propagate  the  covariance  is  found  by  first 
linearizing  the  state  functions  about  the  most  recent  state  estimate  to  obtain  the 
matrix  F  as  shown  in  Equation  (12.621). 


F(t.) 


(2.62) 


This  linearized  state  matrix  is  then  input  into  Equation  (I2.52jl  to  obtain  and 

the  uncertainty  is  propagated  with  Equation  (12.541). 

For  the  measurement  update  stage  of  the  filter,  first  a  measurement  prediction, 
Zpred,  is  determined  by  evaluating  the  nonlinear  measurement  function  at  the  most 
recent  state  estimate. 

Zprediti )  =  h[x{t~i),ti\  (2.63) 


A  measurement  perturbation,  Sz,  is  also  defined,  which  represents  the  difference 
between  the  actual  measurements  and  measurement  prediction  as  shown  in  Equa- 
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tion  (12. 641). 


Zpred.iti') 


(2.64) 


Since  zpred  was  obtained  using  the  nonlinear  measurement  equations,  Sz  is  equivalent 
to  the  residual  in  the  linear  Kalman  filter. 

In  order  to  calculate  a  Kalman  gain  matrix  and  residual  covariance,  a  linearized 
matrix,  must  be  found  in  much  the  same  way  that  F(ti)  was.  The  matrix 

H(U)  is  obtained  by  evaluating  the  derivative  of  the  measurement  equations  with  re¬ 
spect  to  the  state  vector  at  the  most  recent  state  estimate  as  shown  in  Equation  (12.651) . 


H(U) 


dh 


(2.65) 


The  linearized  matrix  iT(tj)  is  then  used  in  Equation  (j2.55j>  to  calculate  the  Kalman 
gain.  With  the  Kalman  gain  matrix  determined,  Equation  (I2.56p  can  be  used  to 
update  the  estimated  perturbation  state  vector,  8x.  Since  the  a  priori  estimate  of  the 
perturbation  state  vector  is  zero,  and  the  measurement  perturbation  is  the  residual, 
this  step  reduces  to  the  form  shown  in  Equation  fl2.66[). 


5x(t+i+1)  =  K(ti+1)8z(ti+1)  (2.66) 

Finally,  the  perturbation  state  estimate  is  added  to  the  nominal  state  trajectory  to 
update  the  total  state  estimate  as  shown  in  Equation  (I2.67j).  and  then  reset  to  zero 
for  the  next  iteration. 

*(X++i)  =  *(*,7+i)  +  6x(tt+ 1)  (2.67) 

Often,  a  certain  level  of  pseudonoise  must  be  added  to  the  system  to  account 
for  additional  uncertainty  which  results  from  the  linearization  process.  This  is  part 
of  tuning  the  filter,  and  is  typically  combined  with  simulation  to  verify  filter  perfor¬ 
mance.  Also,  EKFs  are  somewhat  sensitive  to  initialization  errors  which  can  prevent 
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effective  state  estimation,  particularly  if  there  is  limited  observability  between  the 
measurements  and  states. 

2. 7  Summary 

This  chapter  has  put  in  place  the  foundation  upon  which  the  attitude  estimation 
methods  used  in  this  thesis  are  built.  Topics  in  nomenclature,  inertial  navigation, 
digital  imaging,  and  Kalman  filtering  have  been  presented.  The  next  chapter  describes 
how  these  concepts  are  used  to  develop  a  method  for  combining  inertial  and  optical 
information  to  estimate  attitude. 
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III.  Methodology 


This  chapter  describes  the  methods  and  procedures  used  to  approach  a  solution 
to  the  indoor  aerial  attitude  estimation  problem,  as  well  as  the  experimental 
approach  used  to  evaluate  this  solution.  These  topics  are  divided  into  two  main 
sections-algorithm  development  and  experimental  methods.  Algorithm  development 
is  presented  in  Section  13.11  and  experimental  methods  are  discussed  in  Section  13.21 


3.1  Algorithm  Development 

The  method  presented  in  this  thesis  for  estimating  attitude  using  inertial  and 
optical  data  follows  the  basic  steps  shown  on  the  flowchart  in  Figure  ITT]  These  steps 


Figure  3.1:  Image-aiding  algorithm  flow  chart.  The  process  of  combining  inertial 

and  image  data  follows  the  basic  steps  shown  here.  The  steps  inside  the  shaded  region 
comprise  the  measurement  stage  of  an  EKF. 
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parallel  the  general  outline  of  an  EKF  presented  in  Section  12.6.21  and  each  will  be 
thoroughly  described  in  the  subsequent  portions  of  this  section.  In  order  to  implement 
such  a  Kalman  filter,  a  state  vector  must  be  defined  and  both  a  dynamics  model  and 
a  measurement  model  must  be  established. 

3.1.1  State  Vector.  Attitude  is  expressed  as  the  DCM  which  transforms 
vectors  represented  in  the  vehicle  body  frame  into  their  equivalent  representation  in 
the  navigation  frame,  Cb.  Essentially,  the  true  body-to-navigation  frame  DCM  is  the 
product  of  a  corrupted  estimate  of  the  DCM,  C£,  and  another  DCM  of  errors,  C~,  as 
shown  in  Equation  (13.  lj). 

Cnb=ClCt  (3.1) 

The  quantity  we  wish  to  estimate  is  the  DCM  of  errors.  Assuming  the  errors  are 
small,  they  can  be  expressed  as  a  3-vector,  -0,  whose  elements  represent  the  rotation 
angles  required  to  correct  the  corrupted  body-to-navigation  frame  DCM,  Cb,  to  truth. 
This  vector  is  the  vector  of  perturbation  states  estimated  by  the  extended  Kalman 
filter.  Its  corresponding  DCM  can  be  expressed  as  the  matrix  exponential  of  the 
skew-symmetric  representation  of  i/j. 


Cl  =  e^x  (3.2) 

Since  the  angles  contained  in  -0  are  assumed  to  be  small,  the  matrix  exponential  can 
be  approximated  as  a  matrix  power  series  with  higher  than  Erst  order  terms  neglected. 
Substituting  this  approximation  into  Equation  (13. IK  yields  the  following  relationship: 

C^[7  +  V>x]Cf  (3.3) 

3.1.2  System  Dynamics  Model.  With  the  state  vector  defined,  the  system 
dynamics  model  can  be  established.  This  model  is  taken  from  the  differential  equation 
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governing  the  dynamics  of  Cb  shown  in  Equation  (12.121).  which  is  repeated  here. 


cl  =  «>]  -  C:Kex]C'CL*  (3.4) 

Neglecting  the  Earth’s  turn  rate  and  solving  the  differential  equation  over  one  discrete 
time-step  gives  the  relationship  shown  in  Equation  (13.51).  where  At  represents  the  step 
size  between  instants  in  discrete  time. 


Cb(ti+ 1)  «  C7(*i)exp([u4(ti)At]x)  (3.5) 

The  Earth’s  turn  rate  is  omitted  because  the  update  rate  for  this  particular  im¬ 
plementation  is  approximately  three  hertz  and  the  error  introduced  by  the  Earth’s 
rotation  over  a  period  of  one  third  of  a  second  is  negligible  relative  to  the  precision 
of  a  MEMS-grade  IMU.  Substituting  the  measured  body  rotation  rate,  ujbb  ,  for  ujbib 
and  the  estimated  body-to-navigation  frame  DCM,  Crbl,  for  Cb  gives  the  expression 
shown  in  Equation  (13. 6 j). 

Cb{t-i+ 1)  =  Cb(ti ) exp ([u;^m(£j)At]x)  (3.6) 

Equation  (I2.13j)  shows  that  the  measured  body  rotation  rate  is  the  sum  of  the  true 
body  rotation  rate,  a  vector  of  gyro  biases,  and  zero-mean,  Gaussian  noise.  If  the 
vector  of  biases  is  subtracted  from  the  measured  rotation  rates,  a  more  accurate 
inertial  solution  can  be  obtained  than  the  solution  which  results  if  this  subtraction  is 
not  performed.  Equation  (13.71)  shows  the  results  of  substituting  the  difference  between 
the  measured  body  rotation  rate  and  gyro  bias  vector  into  Equation  (13.61). 

Cl(tbl)  =  cf(tt)  exp  [([<,&)  -  b*]At)  x]  (3.7) 
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This  is  the  nonlinear  equation  used  to  propagate  Cb  from  one  instant  in  discrete  time 
to  the  next.  The  propagation  of  the  perturbation  state  uncertainty  matrix,  P^,  will 
be  described  next. 


3. 1.2.1  System  Model  Linearization.  Before  the  process  uncertainty 
can  be  propagated,  a  linearized  dynamics  matrix  must  be  found  from  which  a  state 
transition  matrix  can  be  determined.  The  first  step  in  finding  this  linearized  dynamics 
matrix  is  to  differentiate  Equation  (13. 3 II  with  respect  to  time. 

C?  =  [^x]Cj+[J  +  V>x]c£  (3.8) 

From  here,  the  DCM  derivatives,  Cb  and  Cbl  are  replaced  with  their  corresponding 
equivalent  products  as  described  by  Equation  (12.4(1 . 

cl  K„X]  =  [^}Cl  +  [/  +  (3.9) 

Substituting  Equation  (13.3(1  for  Cb  and  solving  for  [t/>x]  yields: 

=  -[/ + v>x]c;k„x]c^  +  [/ + v,x]  c;k„x]c^  (3,ioj 


Now,  a  replacement  expression  for  c obnb  will  be  described,  starting  with  the  following 
relationship: 


cA  =  uA  -  Cb  Cn<jJe 

nb  ibm  ^  e  le 


(3.11) 


Substituting  Equations  (j2. 13(1  and  (13.3(1  into  (13.1111  gives: 


=  <  +  bb  +  Wl  -  c\  [i  - 


(3.12) 


Distributing  the  product  in  the  last  term  and  rearranging  yields: 


LL> 


nb 


=  (jJ 


ib 


-  C«C>* 


+  6‘  +  w  *  +  CtW>*]C>»e 


(3.13) 
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Combining  the  first  two  terms  on  the  right  hand  side  of  the  equation  gives: 


‘^  =  ‘4t  +  6‘  +  wt  +  C£[V>x]CX«  (3.14) 

The  expression  found  in  Equation  (13.141  can  now  be  substituted  into  Equation  (I3.10K 
to  obtain: 


W>*]  =  -[J  +  V>*]Cj[u&x]C*  +  (3,15) 

[I  +  v>x]c;([«‘tx]  +  [b‘x]  +  [w‘x]  +  [(c^[i/ix]oiJx]  )ci 

Eliminating  like  terms  from  this  expression  gives: 

Wix]  =  [/  +  ([fe‘x]  +  [w‘x]  +  [(C|W>x]C>  IJx]  )cl  (3.16) 

Finally,  removing  second-order  terms,  neglecting  the  Earth’s  turn  rate  and  collapsing 
the  skew-symmetric  forms  yields: 


ifj  =  Cf}'tbh  +  Cfw6  (3-17) 

This  equation  shows  that  the  perturbation  angles’  rates  of  change  are  simply  the  sum 
of  a  vector  of  biases  and  white,  Gaussian  noise.  This,  then,  gives  rise  to  a  linearized 
state  matrix  that  is  populated  with  all  zeros  and  a  state  transition  matrix  equal  to 
identity.  The  error  matrix,  P^,  is  then  propagated  from  one  instant  in  discrete  time 
to  the  next  by: 

Pynp(t-+i)  =  +  Qd  (3.18) 

Together,  Equations  (I3.6j)  and  (I3.18P  comprise  the  “Propagate  Inertial  Solution”  step 
depicted  on  the  flowchart  shown  in  Figure  EU] 

3.1.3  Measurement  Model.  Along  with  a  system  dynamics  model,  an  EKF 
also  requires  a  measurement  model.  In  this  case,  the  measurement  model  begins 
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with  the  assumptions  regarding  the  structure  of  the  environment  enumerated  in  Sec¬ 
tion  11.2.11  Such  an  environment  will  have  three  mutually  orthogonal  primary  van¬ 
ishing  directions  that  coincide  with  the  three  principal  axes  of  the  Earth-fixed  local 
level  navigation  frame.  Mathematically,  this  means  that  there  will  be  three  vanishing 
points,  Vi,  v2  and  v3,  which  can  be  expressed  as  the  following  unit  pointing  vectors: 


T— i 

(o\ 

<1 

^3 

II 

0 

Vr2  = 

1 

.0, 

.0, 

v  = 


'o' 

0 

v1/ 


(3.19) 


Because  there  are  three  vanishing  points  inherent  in  the  structure  of  the  en¬ 
vironment,  there  are  three  measurements  for  each  image.  Inside  the  Kalman  filter, 
these  are  treated  as  consecutive  updates  to  the  vector  of  perturbation  states  where  the 
a  posteriori  estimate  of  ip  determined  from  one  vanishing  point  becomes  the  a  priori 
estimate  for  the  next.  Once  the  last  update  to  the  perturbation  state  vector  has  been 
performed,  C £  is  corrected  using  the  following  relationship: 

Cb(ti)  =  exp([*/’(tt)x])Cb(ti),  &(%) ->03xi  (3.20) 


This  corresponds  to  the  “Update  Attitude  Estimate”  and  “Reset  Perturbation  States” 
steps  shown  on  the  flowchart  in  Figure  [XU  The  DCM,  C7”,  is  not  updated  between 
the  three  vanishing  point  measurements  in  a  single  image,  so  that  each  of  the  three 
measurement  predictions  utilizes  the  same  a  priori  information  and  all  three  mea¬ 
surements  remain  mutually  independent. 

The  measurement  function  describing  the  vanishing  points’  representations  in 
the  camera  frame  is  shown  in  Equation  (I3.2ip. 

vJ  =  CJC‘v2  +  vt  V  e  [1,3]  (3.21) 
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where  vk  represents  the  measurement  noise  vector  corresponding  to  the  k- th  vanishing 
point.  Substituting  Equation  (j3.3[i  into  Equation  (I3.2ip  yields  the  following  expression 
relating  the  observed  vanishing  point  vector  in  the  camera  frame  to  the  vector  of 
perturbation  states: 


vck  =  CchC\[I  -  V>xK  +  vk  V  ke  [1,3]  (3.22) 

This  is  the  nonlinear  measurement  function  which  must  be  linearized  in  order  to 
compute  a  Kalman  gain  for  use  in  the  update  step  of  the  Kalman  filter. 

3. 1.3.1  Measurement  Model  Linearization.  Before  the  nonlinear  mea¬ 
surement  function  is  differentiated,  it  is  manipulated  algebraically  into  a  more  conve¬ 
nient  form.  Distributing  the  matrix  product  on  the  right  hand  side  of  Equation  (13.22jl 
yields: 


v£  =  C^C$v;-C£C^xK  +  vk  V  fc  e  [1,3]  (3.23) 

The  product,  [i/?x]v]!,  appearing  on  the  right  of  Equation  (13 .231)  represents  a  vector 
cross  product.  The  order  of  multiplication  can  be  reversed,  so  long  as  the  product  is 
multiplied  by  a  factor  of  -1  as  shown  in  Equation  (j3.24[) . 

vj  =  c^c£v2  +  c?ct[vjx]^  +  v„  V  fc  e  [1,3]  (3.24) 

Differentiating  Equation  (13. 24[)  with  respect  to  if)  yields  an  expression  for  the  lin¬ 
earized  measurement  matrices,  Hk. 


Hk{ti)  =  C(Cyt-)[vM  V  fc  £  [1,3]  (3.25) 

3. 1.3. 2  Measurement  Noise  Strength.  In  order  to  obtain  the  Kalman 
gain  matrices  that  are  used  to  incorporate  the  measurements  into  the  estimated  state 
vector,  the  strengths  of  the  measurement  noise  vectors  must  also  be  known.  For  this 
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work,  the  measurement  noises  are  characterized  as  zero-mean,  white  and  Gaussian 
with  autocorrelations  given  by: 


where 


E[vk(ti)vl(tj)]=  RkSij  Vfce[l,3] 
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■k  ~ 


0.01 

0 

0 
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0.01 
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0.01 


V  ke  [1,3] 


(3.26) 


(3.27) 


This  equates  to  a  standard  deviation  of  0.1  in  each  of  the  measured  unit  pointing 
vectors’  Cartesian  coordinates. 

With  the  linearized  measurement  matrices,  Hk(ti ),  and  measurement  noise 
strength  matrices,  Rk,  defined,  Kalman  gains  for  each  measurement  can  be  com¬ 
puted  using  Equation  (12.55H.  These  Kalman  gains  are  then  used  in  Equation  (I2.66P 
to  update  perturbation  state  estimate  for  each  of  the  three  orthogonal  vanishing  di¬ 
rections. 


3.1.4  Obtaining  Vanishing  Point  Measurements.  Now  that  a  dynamics 
model  and  a  measurement  model  have  been  established  and  the  method  for  incor¬ 
porating  measurements  into  the  attitude  estimate  is  in  place,  all  we  lack  are  the 
actual  measurements  themselves.  This  section  describes  how  measurements  of  each 
vanishing  point  are  obtained  from  a  digital  image. 

3. 1.4-1  Edge  Detection  Method  Selection.  The  first  step  in  processing 
an  image  to  find  vanishing  points  is  to  extract  edges.  As  discussed  in  Section  12.5.11 
there  are  various  methods  for  performing  this  step.  Each  of  the  four  edge  detection 
methods  presented  in  Section  12.5.11  were  performed  on  several  hallway  images  to  de¬ 
termine  which  would  be  the  preferable  method  to  use.  An  image  of  a  hallway  taken 
with  the  host  vehicle’s  camera  and  the  results  of  performing  all  four  methods  of  edge 
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detection  are  shown  in  Figure  13.21  Because  the  canny  edge  detector  returned  more 
edges,  in  particular  useful  edges  such  as  the  borders  of  the  ceiling  tiles,  the  canny 
edge  detector  was  chosen  for  processing  the  images  collected  during  experiments. 

3. 1-4-2  Line  Detection  Method  Selection.  After  edges  have  been  found 
in  an  image,  the  next  step  towards  identifying  the  vanishing  points  is  to  extract 
straight  lines.  To  this  end,  two  of  the  line  detection  methods  discussed  in  Sectionl2.5.2l 
were  evaluated  to  determine  which  is  better  suited  for  this  particular  application. 

While  the  Hough  transform  is  a  widely-used  method  for  extracting  straight  lines 
in  images,  the  comparison  presented  by  Kessler,  et  al.  in  HU  shows  moderately  re¬ 
duced  computation  time  in  using  a  connected-components  method  to  accomplish  this 
task.  Because  the  microcomputer  on  board  the  host  vehicle  has  limited  capacity  for 
image  processing,  a  fast  line  finding  technique  can  provide  distinct  advantages  over 
other,  slower  methods.  In  light  of  this  information,  a  connected  components  line 
detection  process  as  described  by  Kosecka  and  Zhang  [19]  was  implemented  and  com¬ 
pared  with  the  Hough  transform  method  to  determine  which  is  faster.  Both  methods 
were  executed  using  Matlab  R2009b  software  on  a  2.5  GHz  Windows®  computer. 
The  resolution  of  the  Hough  accumulator  was  set  at  one  pixel  for  p  and  one  degree 
for  9,  and  sixteen  equally-spaced  gradient  direction  regions  were  used  for  the  con¬ 
nected  components  algorithm.  The  running  times  of  each  line  detection  method  for 
four  different  images  captured  from  the  quadrotor’s  onboard  camera  are  shown  in 
Table  13.11 

Table  3.1:  Line  detection  times.  Two  different  methods’  processing  times  for  de¬ 

tecting  lines  in  several  images  captured  with  the  host  vehicle’s  camera  are  shown 
here.  _ 


Image 

Connected  Components 

Hough  Transform 

Image  1 

0.26s 

0.37s 

Image  2 

0.27s 

0.26s 

Image  3 

0.26s 

0.29s 

Image  4 

0.28s 

0.11s 
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(a)  Hallway  image 


(d)  Sobel  (e)  Canny 

Figure  3.2:  Edge  detection  comparison.  The  four  indicated  edge  detection  methods 
were  used  to  produce  subfigures  (b)  through  (e)  from  the  image  in  subfigure  (a).  The 
canny  edge  detection  method  returns  more  edges  than  the  others. 
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The  images  used  in  this  comparison  are  a  different  resolution  than  the  ones  used 
by  Kessler  et  ah,  and  the  specific  implementation  of  each  method  is  likely  different  as 
well.  These  two  factors  account  for  the  differing  results  obtained  here.  Though  nei¬ 
ther  method  is  always  faster  than  the  other,  the  Hough  transform  was  chosen  as  the 
line  detection  method  for  this  research  because  both  the  Matlab  and  OpenCV  soft¬ 
ware  packages  contain  functions  for  implementing  it  within  their  respective  libraries. 
With  the  line  detection  method  selected,  the  vanishing  point  detection  process  can 
be  developed. 

3. 1.4-3  Expressing  Lines  with  Unit  Normals.  In  order  to  find  a  vanish¬ 
ing  point  from  the  lines  detected  by  the  Hough  transform,  each  line  is  first  represented 
with  a  unit  normal  vector  as  explained  in  Section  12.5.2.41  This  representation  is  ob¬ 
tained  by  removing  the  distortion  from  the  pixel  coordinates  of  each  line’s  endpoints, 
converting  the  undistorted  pixel  coordinates  to  camera  frame  line-of-sight  vectors 
with  the  Tcpix  matrix,  and  performing  the  cross  product  of  each  resulting  pair.  Each 
cross  product  is  then  divided  by  its  magnitude  to  obtain  a  unit  vector  as  shown  in 
Equation  (12.371). 

3. 1-4-4  Incorporating  Prior  Knowledge.  Vanishing  points  are  found 
by  searching  for  places  where  many  lines  mutually  intersect.  Not  every  line  in  the 
image  passes  through,  or  even  near,  a  particular  vanishing  point,  so  lines  that  are 
distant  from  where  the  vanishing  point  is  likely  to  lie  need  not  be  included  in  the 
search  for  that  vanishing  point.  To  determine  a  likely  area  for  finding  a  vanishing 
point,  an  initial  prediction  is  made  using  Equation  (13.28I). 

vJUi(fi)  =  CtCi(t-)y"  (3.28) 

Only  lines  that  are  near  this  prediction  are  used  to  find  the  measured  vanishing  point. 

Since  the  magnitude  of  the  dot  product  between  a  line  and  a  point  indicates 
how  close  the  two  are  to  one  another,  only  lines  whose  dot  product  with  the  predicted 
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vanishing  point  has  a  magnitude  less  than  a  statistically-determined  threshold  are 
considered  to  correspond  to  the  vanishing  point  of  interest.  The  dot  product  of  two 
unit  vectors  is  equivalent  to  the  cosine  of  the  angle  between  them,  and  a  point  that 
lies  on  a  particular  line  will  be  perpendicular  to  that  line’s  unit  normal  vector.  This 
means  that,  by  definition,  the  angle  between  a  line’s  unit  normal  and  any  point  on  the 
line  is  90  degrees.  Thresholding  on  the  magnitude  of  the  dot  product  then  amounts 
to  determining  a  maximum  angular  deviation  from  90  degrees  between  the  predicted 
vanishing  point  and  each  line’s  unit  normal.  The  angular  threshold  chosen  for  this 
work  is  5  degrees,  which  corresponds  to  a  maximum  dot  product  magnitude  of  0.087. 
Figure  13.31  shows  on  an  artificial  edge  image  which  lines  are  within  5  degrees  of  a 
particular  point.  The  lines  within  the  dot  product  threshold  are  called  “support 


Figure  3.3:  Lines  near  a  point.  The  lines  shown  in  blue  are  within  5  degrees  of  the 
marked  point. 


lines”  for  that  particular  vanishing  point.  With  the  set  of  support  lines  determined, 
a  vanishing  point  can  be  found  from  its  contents. 
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The  linearized  measurement  equation  (13 .2  5ft  and  measurement  prediction  equa¬ 
tion  (13.281)  both  require  that  a  body-to-camera  frame  DCM,  Cl,  be  known  before  they 
can  be  evaluated.  No  actual  air  vehicle  was  used  to  carry  the  test  camera  during  this 
research,  so  the  camera  itself  is  treated  as  the  vehicle.  As  described  in  Section  [2~2l  the 
rc-axis  of  the  vehicle  body  reference  frame  is  typically  oriented  towards  the  direction  of 
travel,  which  would  be  out  the  lens  of  the  camera  in  this  case.  The  z-axis  of  the  cam¬ 
era  reference  frame  is  typically  oriented  out  the  lens  as  well.  To  maintain  consistency 
with  these  conventions,  the  body-to-camera  frame  DCM  simply  performs  a  90-degree 
rotation  about  the  body-frame’s  y-axis.  This  DCM  is  shown  in  Equation  (I3.29p. 
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3. 1-4  -5  RAN  SAC  for  Vanishing  Point  Detection.  Recall  that  the 
RANSAC  algorithm  uses  a  random  minimum  subset  of  experimental  data  to  build  a 
model  and  then  finds  how  much  more  of  the  data  fit  this  model.  After  a  statistically 
determined  number  of  random  subsets  have  been  examined,  the  one  with  the  largest 
consensus  set  is  declared  the  best  fit  to  the  data  and  the  model  is  refined  using  only 
members  of  this  consensus  set,  i.e. ,  inkers. 

A  vanishing  point  is  a  point  where  many  lines  mutually  intersect.  The  smallest 
minimum  subset  to  model  such  a  point  is  the  intersection  of  only  two  lines.  Therefore, 
two  of  the  support  lines  are  selected  at  random,  and  their  intersection  calculated  by 
evaluating  the  normalized  cross  product  of  their  unit  normal  vectors  as  shown  in 
Equation  (12.381) .  This  intersection  is  then  compared  to  the  complete  set  of  support 
lines  to  find  how  many  other  lines  pass  near  it.  Once  again,  the  dot  product  is  used 
to  make  this  comparison.  The  magnitude  of  the  dot  product  between  the  initial 
intersection  point  and  the  remaining  support  lines  is  calculated  and  those  below  a 
threshold  are  added  to  the  inkers  of  that  consensus  set.  In  this  comparison,  only  lines 
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whose  planar  normals  are  within  0.5  degrees  of  90  degrees  from  the  intersection  are 
considered  inliers  to  that  candidate  intersection.  This  angular  threshold  corresponds 
to  a  maximum  dot  product  magnitude  of  0.0087.  Figure  E31  shows  which  lines  in  an 
artificial  edge  image  are  within  0.5  degrees  of  an  initial  candidate  intersection. 


Figure  3.4:  Inlier  lines.  The  lines  shown  in  red  are  inliers  to  the  intersection  marked 
with  the  green  circle. 


Since  the  proportion  of  inliers  is  not  known  from  the  outset,  the  iterative  ap¬ 
proach  for  determining  the  minimum  number  of  random  pairings  to  examine  described 
at  the  end  of  Section  12.5.3.21  must  be  used.  An  initial  estimate  of  1  is  used  for  the 
proportion  of  outliers,  e.  This  results  in  an  initial  value  of  infinity  for  the  minimum 
number  of  random  pairings  to  try,  but  the  number  is  quickly  reduced  as  larger  con¬ 
sensus  sets  are  found. 

In  some  instances,  there  may  not  be  a  single  largest  consensus  set,  i.e.,  multiple 
candidate  intersections  may  have  the  same  number  of  inliers.  For  such  cases,  a  tie- 
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breaking  criterion  is  necessary  to  determine  the  best  consensus  set.  How  close  the 
inliers  are  to  their  corresponding  candidate  intersection  provides  this  criterion.  The 
set  with  the  tightest  grouping  as  determined  by  the  sum  of  the  dot  products  between 
each  line  and  the  candidate  intersection  is  chosen  as  the  better  consensus  set. 

In  other  instances,  the  largest  consensus  set  determined  from  the  set  of  support 
lines  may  only  contain  the  original  two  lines  used  to  find  the  candidate  intersection. 
In  this  case,  the  algorithm  ends  in  failure  and  no  vanishing  point  is  declared  in  that 
particular  direction  for  that  image. 

3. 1-4  -6  Vanishing  Point  Estimate  Refinement.  If  a  consensus  set  is 
found,  the  final  step  in  finding  the  vanishing  point  is  to  find  the  point  that  best  fits 
the  consensus  set  of  lines.  This  is  accomplished  using  static  optimization  techniques. 
Such  techniques  begin  with  the  definition  of  a  cost  function.  The  best  estimate  of  the 
vanishing  point  from  a  group  of  lines  is  the  point  which  minimizes  the  dot  product 
with  each  of  the  lines  in  the  consensus  set.  This  can  be  expressed  mathematically  as: 

Nk 

=  v  fee  [1,3]  (3.30) 

2=1 

where  Nk  is  the  number  of  lines  in  the  consensus  set  for  the  /c-th  vanishing  point  and 
Jk  is  the  total  cost  associated  with  vk.  Unfortunately,  minimizing  the  cost  function 
shown  in  Equation  (I3.30|)  will  always  lead  to  the  trivial  solution,  =  03*1 .  While  this 
solution  does,  in  fact,  minimize  </&,  it  does  not  reveal  the  direction  of  the  vanishing 
point.  The  trivial  solution  can  only  be  avoided  by  adding  a  constraint  to  the  cost 
function  through  the  use  of  a  Lagrange  multiplier,  g.  The  constraint  to  be  added  in 
this  case  is  to  require  that  v*,  must  be  of  unit  length.  Equation  (13.31  j)  shows  the  new, 
constrained  cost  function. 

Nk 

Jk  =  £(^V*J2  +  rj(v2kx  +  v2ky  +  vl  -  1)  V  k  c[l,3]  (3.31) 

2=1 
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The  optimized  solution  is  then  found  by  implementing  a  gradient  descent  technique 
to  solve  for  the  values  of  v*,  and  r/  which  satisfy: 

VVfc,^fe  =  04xi  V  A:  e  [1,3]  (3.32) 

The  candidate  intersection  is  used  as  the  starting  point  for  the  iterative  process  of 
determining  the  value  of  v&  that  minimizes  J This  solution  is  the  measurement 
realization  that  is  used  to  update  the  Kalman  filter. 

Once  a  vanishing  point  is  found,  the  inlier  lines  associated  with  it  are  removed 
from  the  total  set  of  lines  in  the  image  before  moving  on  to  search  for  the  next  van¬ 
ishing  point.  This  is  done  to  reduce  the  amount  of  extraneous  information  presented 
to  the  vanishing  point  detection  process  in  subsequent  directions  and  is  valid  because 
the  only  line  which  would  legitimately  pass  through  two  vanishing  points  is  a  horizon 
line.  Since  the  host  vehicle  is  assumed  to  be  operating  indoors,  the  horizon  will  not 
be  in  the  camera’s  view. 

3. 1.4-7  Direction  Disambiguation.  Every  pair  of  lines  in  an  image  in¬ 
tersects  in  exactly  two  places,  one  180  degrees  offset  from  the  other.  This  is  evidenced 
by  the  effect  of  reversing  the  order  of  the  cross  product  in  Equation  (12.381).  Due  to  the 
random  nature  of  the  initial  pairing  of  image  lines  performed  by  the  RANSAC  method, 
it  is  possible  for  the  algorithm  to  converge  on  a  vanishing  point  that  is  opposite  the 
one  being  sought.  To  obtain  the  correct  measurement  under  these  circumstances  and 
prevent  degrading  the  overall  attitude  estimate,  each  measurement  is  compared  with 
the  measurement  prediction  to  confirm  that  it  is  in  the  same  hemisphere.  If  the  com¬ 
ponent  of  the  predicted  vanishing  point  with  the  largest  magnitude  does  not  have 
the  same  sign  as  its  corresponding  component  in  the  measured  vanishing  point,  the 
measured  vanishing  point  is  flipped  by  multiplying  it  by  a  factor  of  -1.  This  ensures 
that  the  predicted  and  measured  vanishing  points  are  always  in  the  same  hemisphere. 
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3. 1-4-8  Residual  Monitoring.  Occasionally,  it  is  possible  for  the  van¬ 
ishing  point  detection  algorithm  to  commit  large  errors.  For  instance,  measurement 
noise  may  cause  the  camera’s  perception  of  a  tight  group  of  lines  that  are  horizon¬ 
tally  parallel  in  the  world  to  appear  to  cross  on  the  image  plane.  In  such  a  case,  the 
^-component  of  the  horizontal  vanishing  point  will  be  found  to  be  nearly  equal  to  1 
when  the  true  vanishing  point  has  a  ^-component  nearly  equal  to  0.  This  presents  a 
very  large  measurement  residual  and  the  measurement  should  be  ignored.  The  diag¬ 
onals  of  the  residual  covariance  matrix,  H  k(U)  P^(t^)  (U)  +  Rk ,  can  be  used  to 
filter  out  such  erroneous  measurements.  A  measurement  that  yields  a  residual  whose 
magnitude  in  any  of  its  components  is  larger  than  3-cr  can  be  safely  ignored  with 
99.6%  probability.  Stated  another  way,  discarding  measurements  which  lie  further 
than  3-cr  from  the  prediction  will  only  eliminate  a  valid  measurement  0.4%  of  the 
time. 

3.2  Experimental  Methods 

This  section  presents  the  experimental  methods  used  to  evaluate  the  attitude 
estimation  technique  described  in  Section  13.11  The  topics  presented  here  include  a 
description  of  the  test  equipment  and  an  outline  of  the  procedures  used  to  acquire 
test  data. 

3.2.1  Equipment.  A  variety  of  test  equipment  was  used  to  collect  data 
for  evaluating  the  attitude  estimation  method  presented  in  Section  13.11  The  system 
under  test  consists  primarily  of  the  same  models  of  MEMS  IMU  and  camera  with 
which  the  proposed  host  vehicle  is  equipped.  These  sensors  and  their  associated 
circuitry  are  housed  inside  an  approximately  80  mm  x  60  mm  x  160  mm  plastic  box 
as  shown  in  Figure  13.51  This  sensor  box  serves  as  a  mock-up  of  the  quadrotor  for 
which  the  attitude  estimation  algorithm  is  intended.  Additionally,  a  tactical-grade 
IMU  was  used  to  provide  more  accurate  attitude  data  than  the  estimates  provided  by 
the  Kalman  filter  for  later  comparison. 
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Figure  3.5:  System  under  test.  The  system  under  test  consists  of  the  IMU  and 

camera  shown  here  housed  inside  a  plastic  box. 
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3. 2. 1.1  MEMS  IMU.  The  MEMS  IMU  used  onboard  the  proposed 
host  vehicle  is  the  ADIS  16355  produced  by  Analog  Devices,  Inc.  It  is  a  complete 
triple-axis  inertial  unit  contained  inside  an  approximately  23  mm  x  23  mm  x  23  mm 
cube.  Its  small  size,  light  weight  and  low  power  requirement  make  it  an  attractive 
option  for  applications  such  as  the  one  in  this  thesis. 

3.2. 1.2  Tactical- Grade  IMU.  The  tactical-grade  IMU  used  to  compare 
performance  with  the  attitude  estimation  technique  is  the  HG1700-AG58  manufac¬ 
tured  by  Honeywell.  This  is  a  much  higher  performance  device  than  the  MEMS  IMU, 
as  it  is  equipped  with  ring  laser  gyroscopes.  While  it  provides  much  more  accurate 
angular  rate  measurements  than  the  MEMS  IMU,  it  is  also  much  larger,  heavier  and 
requires  significantly  more  power.  The  specifications  of  both  the  MEMS  and  tacti¬ 
cal  IMUs  as  published  by  their  respective  manufacturers  are  shown  in  Table  13.21  for 
side-by-side  comparison. 


Table  3.2:  IMU  specifications 


Parameter  (units) 

MEMS  IMU 

Tactical  IMU 

Sample  rate  (Hz) 

<  819 

100 

Input  range  (deg/s) 

±300 

±1000 

Gyro  rate  bias  (deg/hr) 

54 

1 

Angular  random  walk  (deg/ \/hr) 

4.2 

0.125 

Dimensions  (mm) 

23  x  23  x  23 

168  x  196  x  146 

Weight  (g) 

16 

4500 

Power  required  (W) 

<  0.3 

~8 

3.2. 1.3  Camera.  The  camera  used  onboard  the  proposed  host  vehicle 
is  the  Webcam  Pro  9000  produced  by  Logitech.  It  is  equipped  with  a  two  megapixel 
sensing  array  and  the  associated  optics.  There  is  also  a  dynamic  autofocus  feature, 
but  for  this  application,  the  autofocus  was  disabled.  While  disabling  the  autofocus 
sometimes  results  in  blurred  images,  it  ensures  that  the  focal  length  remains  constant. 
The  constant  focal  length  facilitates  the  use  of  a  single  set  of  calibration  parameters 
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for  the  image  processing  tasks.  The  camera  was  used  to  capture  640  x  480  resolution 
grayscale  images  for  the  optical-aiding  portion  of  the  Kalman  filter. 

3. 2. 1-4  Peripherals.  In  addition  to  the  primary  test  equipment  already 
described,  various  pieces  of  secondary  equipment  were  also  requisite  in  performing 
this  research.  Power  to  both  the  tactical  IMU  and  inertial-optical  sensor  box  was 
provided  by  a  standard  12V  car  battery  connected  to  an  800- Watt  power  inverter. 
Novatel,  Inc.’s  synchronized  position,  attitude  and  navigation  (SPAN™)  combined 
global  navigation  satellite  system/inertial  navigation  system  (GNSS/INS)  receiver 
was  used  to  interface  with  the  tactical  IMU.  Also,  two  laptop  computers  were  used 
to  record  the  data  generated  by  the  two  sensor  platforms.  A  wheeled  cart  enabled 
mobility  of  the  entire  configuration  for  dynamic  test  events.  Lastly,  pieces  of  8020 
aluminum  were  used  to  construct  a  rigid  frame  to  mount  the  sensor  box  and  tactical 
IMU  together  and  also  to  provide  a  structure  from  which  this  joint  apparatus  could 
be  suspended.  The  combined  testing  rig  is  shown  in  Figure  13761 

3.2.2  Procedures.  The  experimental  procedures  for  collecting  the  data  used 
to  evaluate  the  attitude  estimation  technique  consisted  primarily  of  two  main  tasks- 
carnera  calibration  and  motion  profile  development.  These  tasks  were  performed  to 
enable  an  evaluation  of  the  attitude  estimation  method. 

3.2.2. 1  Camera  Calibration.  Before  any  of  the  steps  beyond  line  de¬ 
tection  in  the  image  aiding  process  can  be  implemented,  an  intrinsic  camera  matrix 
and  optical  distortion  parameters  must  be  known.  These  are  found  by  performing  a 
calibration  of  the  test  camera.  Multiple  images  of  a  camera  calibration  board  were 
captured  at  varying  range  and  orientation  and  then  processed  using  Bouget’s  Camera 
Calibration  Toolbox  for  Matlab  [3J  to  extract  the  desired  parameters.  The  results  are 
shown  in  Table  13.31 
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Figure  3.6:  Test  rig.  The  equipment  shown  here  were  used  to  collect  data  for 

evaluating  the  attitude  estimation  method  described  in  this  thesis. 


Table  3.3:  Camera  calibration  parameters.  Uncertainties  express  3-cr  boundaries. 


Parameter  (Units) 

Value 

Focal  length  (pixels) 

[528.6,  528.6]  ±  [0.3,  0.3] 

Principal  point  (pixels) 

[307.8,  224.1]  ±  [0.5,  0.5] 

Skew  factor 

3.5  x  10”4±  1.6  x  10~4 

Radial  distortion  coefficients 

[0.043,  -0.17,  0.075]  ±  [0.0026,  0.01,  0.012] 

Tangential  distortion  coefficients 

[-9  x  10"5,  7.3  x  10'4]  ±  [2.3  x  10~4,  2.1  x  10'4] 
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3. 2. 2. 2  Motion  Profile.  Since  the  attitude  estimation  algorithm  is 
intended  for  use  on  a  flying  vehicle,  a  test  profile  was  developed  to  mimic  the  motion 
of  flight.  The  combined  sensor  box  and  HG1700  IMU  apparatus  was  used  to  represent 
the  vehicle.  A  rectangle  was  marked  on  the  floor  with  electrician’s  tape  outlining 
the  apparatus’s  starting  position  so  that  it  could  be  returned  to  roughly  the  same 
orientation  for  each  of  15  sample  runs.  Approximately  60  seconds  of  static  data  were 
collected  at  the  beginning  of  each  sample  run  before  the  mock  vehicle  was  lifted  off 
the  floor  and  hung  from  the  wheeled  cart  as  shown  in  Figure  13761  Then,  the  cart  was 
pushed  through  one  hallway  towards  an  intersection  with  another,  clockwise  around 
the  corner,  and  through  the  second  hallway,  stopping  approximately  4.5  meters  shy 
of  its  end.  This  approximate  trajectory  is  depicted  in  Figure  13.71  After  any  swinging 
motion  in  the  mock  vehicle  had  subsided,  another  60  seconds  of  static  data  were 
collected  at  the  end  of  each  run. 


Figure  3.7:  Pathway  through  halls.  The  test  apparatus  began  at  the  origin  of  the 
Earth-fixed  navigation  frame  and  approximately  followed  the  indicated  path  through 
the  hallways. 
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3.3 


Summary 


This  chapter  has  discussed  the  development  of  the  attitude  estimation  algorithm 
and  the  experimental  procedures  used  to  evaluate  it.  With  these  tasks  performed,  the 
test  data  can  be  processed  and  the  results  presented  and  analyzed  in  the  next  chapter. 


IV.  Results  and  Analysis 


This  chapter  discusses  the  results  of  implementing  the  coupled  vanishing  point 
and  inertial  attitude  estimation  technique  on  the  data  that  were  collected  during 
experimentation.  The  methods  used  to  process  the  experimental  data  are  presented 
in  Section  14.11  the  results  of  the  vanishing  point  detection  procedure  are  presented 
and  analyzed  in  Section  14.21  and  the  attitude  estimates  are  presented  and  discussed 
in  Section  14.31 

4-1  Data  Processing 

After  the  experiments  described  in  Chapter  IIIII  were  performed,  the  data  col¬ 
lected  were  postprocessed  using  The  Mathworks,  Inc.’s  Matlab  software.  The  data 
processing  procedures  include  calculating  the  gyro  biases,  generating  attitude  profiles 
from  the  unaided  MEMS  inertial,  vision-aided  MEMS  inertial,  and  unaided  tactical 
inertial  data,  and  determining  the  errors  in  the  estimated  attitude  profiles. 

4-1.1  Gyro  Bias  Calculation.  Equation  Q2T3j)  expresses  the  gyroscope  mea¬ 
surements  as  the  sum  of  the  true  relative  rotation  rate  between  the  inertial  and  body 
reference  frames,  the  vector  of  gyro  biases,  and  zero-mean,  white,  Gaussian  noise. 
During  the  initial  static  portion  of  each  data  run,  the  test  apparatus  is  known  to  be 
at  rest.  Thus,  because  the  Earth’s  rotation  rate  can  be  neglected,  the  gyro  biases  can 
be  directly  estimated.  The  initial  static  portion  of  each  run  has  a  total  duration  of 
approximately  60  seconds,  in  which  time  the  Earth  will  have  rotated  approximately 
0.25  degrees.  This  total  rotation  is  within  the  expected  margin  of  error  for  the  esti¬ 
mation  technique,  so  neglecting  the  Earth  rate  is  justified.  Assuming  an  Earth  turn 
rate  equal  to  zero  and  a  stationary  inertial  sensor  means  that  the  true  rotation  rate 
can  also  be  assumed  to  equal  zero.  Therefore,  the  bias  is  the  only  factor  that  will 
contribute  to  the  mean  of  the  gyro  measurements  during  the  initial  stationary  por¬ 
tion  of  each  data  run.  As  was  discussed  in  Section  12.3.21  the  bias  will  be  treated  as 
a  fixed,  deterministic  quantity  for  the  duration  of  a  single  test  run.  This  means  that 
the  vector  of  gyro  biases  for  a  particular  run  can  be  determined  by  calculating  the 
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mean  of  the  first  minute’s  worth  of  inertial  measurements. 


bb 


(4.1) 


Figure  HU]  shows  the  biases  that  were  determined  from  each  of  the  15  test  runs. 
The  largest  biases  are  in  the  measured  rotation  rate  about  the  mock  vehicle’s  pitch, 
i.e. ,  the  body  frame’s  y,  axis.  These  correspond  to  an  average  measured  rotation 
rate  of  approximately  4  degrees  per  second  obtained  while  the  IMU  was  held  station¬ 
ary.  Clearly,  these  biases  can  degrade  the  overall  attitude  estimate  if  they  are  not 
compensated  for. 
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Figure  4.1:  Gyro  biases.  The  biases  in  each  of  the  3  MEMS  gyros  calculated  from 
each  of  the  15  test  runs  are  shown  here. 


4-1.2  Unaided  Inertial  Attitude  Profile  Generation.  Once  the  gyro  biases 
are  known,  unaided  inertial  attitude  profiles  can  be  calculated.  Beginning  from  an 
initial  attitude  estimate,  unaided  attitude  profiles  based  on  measurements  taken  from 
either  the  MEMS  or  tactical  IMU  are  generated  using  Equation  (13.71) .  The  attitude 
profiles  originating  from  the  tactical  inertial  data  are  the  most  accurate  available  for 
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comparison  with  attitude  estimates  from  other  sources.  The  HG1700  IMU  used  to 
obtain  these  measurements  has  a  drift  rate  of  only  1  degree  per  hour,  and  a  test  run 
has  a  duration  of  only  approximately  3.5  minutes.  In  this  time,  the  tactical  inertial 
solution  will  have  drifted  only  0.06  degrees.  Therefore,  the  tactical  inertial  attitudes 
will  be  assumed  to  be  equivalent  to  truth  for  the  purpose  of  calculating  attitude 
errors.  An  example  of  one  unaided  MEMS  inertial  profile  and  the  corresponding 
tactical  inertial  profile  are  shown  in  Figure  EOl  These  attitude  profiles  are  expressed 
in  terms  of  Euler  angles.  Note  the  drift  in  the  unaided  MEMS  inertial  profile.  It  is 
this  drift  that  we  wish  to  constrain  through  the  use  of  the  vision-aiding  Kalman  filter. 

4-1.3  Error  Calculation.  A  method  for  determining  attitude  errors  is  nec¬ 
essary  to  be  able  to  evaluate  the  accuracy  of  any  of  the  attitude  profiles  developed 
during  this  research.  Equations  (13.  lj)  and  (13. 2p  can  be  combined  to  facilitate  error 
calculation  as: 


(4.2) 


Solving  this  equation  for  the  vector  of  error  angles,  i/y  yields: 


i/.x=ln  (CJCj) 


(4.3) 


The  body-to-navigation  frame  DCM  calculated  from  the  tactical  inertial  data  is  sub¬ 
stituted  for  Cl  before  calculating  the  matrix  logarithm  on  the  right  side  of  Equa¬ 
tion  (I4.3[).  Finally,  collapsing  the  skew-symmetric  matrix  gives  the  vector  of  errors, 


Figure  14.31  shows  the  results  of  applying  this  error  calculation  method  to  the 
unaided  inertial  profile  shown  in  Figure l4~2l  Again,  the  unbounded  drift  in  the  unaided 
attitude  profile  is  clearly  evident.  The  abrupt  change  in  all  three  components  of  the 
error  vector  at  about  100  seconds  corresponds  to  when  the  simulated  vehicle  turned 
the  corner  from  one  hallway  to  another.  This  phenomenon  is  present  in  all  of  the  15 
test  runs. 
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Figure  4.2:  Unaided  inertial  attitude  profile.  The  unaided  MEMS  inertial  attitude 
profile  shown  here  drifts  unbounded  due  to  additive  inertial  measurement  noise. 
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Figure  4.3:  Unaided  MEMS  inertial  errors.  The  errors  in  the  unaided  MEMS 

inertial  profile  shown  in  Figure  [4721  are  displayed  here. 
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4-1-4  Process  Noise  Characterization.  With  an  error  calculation  method 
in  place,  the  discrete-time  process  noise  strength,  Qd,  can  be  determined.  This  is 
accomplished  by  performing  an  analysis  of  the  ensemble  of  errors  in  the  complete  set 
of  unaided  MEMS  inertial  attitude  solutions.  The  method  described  in  Section  14.1.31 
is  used  to  find  the  errors  in  the  unaided  MEMS  inertial  solutions  from  each  of  the  test 
runs,  and  the  resulting  ensemble  of  errors  is  used  to  calculate  standard  deviations  for 
the  full  set  of  test  data.  Figure  14.41  shows  the  ensemble  of  unaided  MEMS  inertial 
errors  for  all  3  gyroscopes  and  all  15  data  runs  along  with  the  calculated  ensemble 
standard  deviations.  The  time  segment  shown  begins  after  the  initial  60  second  static 
period  over  which  the  gyro  measurements  have  been  averaged  to  determine  the  gyro 
biases. 


Time  (s) 

Figure  4.4:  Unaided  MEMS  inertial  ensemble  errors.  The  errors  present  in  the 

unaided  MEMS  inertial  solutions  in  all  3  axes  from  all  15  test  runs  and  the  calculated 
ensemble  standard  deviations  are  shown  here. 

The  process  noise  strength  can  be  determined  by  finding  a  linear  fit  to  the 
ensemble  variances.  The  first-order  coefficient  of  a  linear  fit  to  these  variances  reveals 
the  rate  at  which  the  uncertainty  in  the  attitude  estimates  increases  when  the  MEMS 
inertial  data  are  unaided.  The  linear  fit  to  the  unaided  MEMS  inertial  ensemble 
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variances  from  60  to  120  seconds  is  shown  in  Figure  14.51  The  variances  over  this 
interval  increase  at  a  rate  of  approximately  0.63  deg2/s.  Multiplying  this  rate  by  the 
time-step  between  MEMS  inertial  measurements  (4.9  milliseconds)  yields  the  following 
value  for  the  discrete-time  process  noise  strength: 


Qd  - 


3.1  x  lO"3  0  0 

0  3.1  x  lO-3  0 

0  0  3.1  x  lO’3 


deg2 


(4.4) 


Figure  4.5:  Unaided  MEMS  inertial  variances.  The  process  noise  strength  is  de¬ 
termined  from  the  first-order  coefficient  of  the  linear  fit  to  the  unaided  inertial  error 
variances. 


4-2  Vanishing  Point  Detection 

With  the  gyro  biases  and  process  noise  strength  determined,  the  attitude  esti¬ 
mation  technique  described  in  Section  13.11  can  be  implemented  on  the  data  that  were 
collected  during  the  test  runs.  The  vanishing  points  found  in  the  images  captured  by 
the  camera  will  be  used  to  constrain  the  drift  in  the  MEMS  inertial  solutions.  There- 
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fore,  the  effectiveness  of  the  vanishing  point  detection  algorithm  is  discussed  in  this 
section  before  presenting  the  vision-aided  attitude  profiles  and  errors  in  the  section 
that  follows. 

Figure  14. hi  shows  an  image  that  was  captured  approximately  82  seconds  into  one 
of  the  test  runs.  In  this  example,  all  three  vanishing  points  have  been  identified,  and 
lines  which  have  been  declared  by  the  RANSAC  algorithm  to  be  inlicrs  to  any  of  the 
three  principal  vanishing  directions  are  highlighted.  Note  that  some  lines  returned 
by  the  Hough  transform  have  been  rejected  as  outliers.  Also,  the  predicted  vanishing 
point  in  the  direction  down  the  hall  and  the  corresponding  vanishing  point  found  in 
the  image  are  close  to  one  another,  resulting  in  a  small  measurement  residual. 


Figure  4.6:  Vanishing  points  found  in  a  sample  image.  All  three  vanishing  points 
have  been  identified  in  this  image.  The  magenta  circle  is  where  Vi  was  predicted  to 
be,  and  the  red  square  is  where  V]  was  found.  The  blue  lines  are  inlicrs  to  Vx,  the 
green  lines  are  inliers  to  v2,  the  red  lines  are  inliers  to  v3,  and  the  cyan  lines  are 
outliers. 


4-2.1  Measurement  Susceptibility  to  Noise.  Unfortunately,  the  ransac 
vanishing  point  detection  scheme  does  not  always  return  the  type  of  result  shown  in 
Figure  [03  An  example  of  an  image  in  which  v2  is  found  to  be  only  a  few  degrees  from 
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Vi  is  given  in  Figure  14771  Of  course,  V2  actually  lies  in  a  direction  nearly  parallel  to  the 
image  plane.  The  three  lines  identified  as  inliers  to  this  spurious  measurement  pass 
close  enough  to  the  prediction  of  v2  that  they  meet  the  criteria  for  being  classified  as 
support  lines.  However,  the  glancing  angle  at  which  the  camera  views  the  horizontal 
stripe  on  the  floor  introduces  significant  uncertainty  into  the  locations  of  the  lines 
defined  by  the  edges  of  the  stripe.  This  imprecision  is  what  makes  the  lines  appear 
to  intersect  on  the  image  plane. 


Figure  4.7:  Spurious  vanishing  point  measurement.  The  intersection  of  the  inliers 
to  v2  is  found  on  the  image  plane  when  the  true  vanishing  point  clearly  lies  far  from 
this  measurement. 

4-2.2  Measurement  Residuals.  The  sequence  of  residuals  in  the  Kalman 
filter  shows  how  close  the  measurements  are  to  their  respective  predictions.  This 
information  can  reveal  how  often  spurious  measurements  such  as  the  one  discussed 
in  Section  14.2.11  occur.  The  residual  monitoring  process  described  in  Section  13.1.4.81 
prevents  these  spurious  measurements  from  unduly  influencing  the  attitude  estimates. 
Recall  that  if  any  component  of  a  residual  is  found  to  lie  outside  of  three  standard 
deviations  from  zero,  the  corresponding  measurement  is  ignored  by  the  Kalman  filter 
and  treated  as  if  no  measurement  were  obtained  for  that  vanishing  point. 
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Figure  l4~8l  shows  the  sequences  of  residuals  and  their  corresponding  3 -a  bound¬ 
aries  from  one  of  the  test  runs.  The  numeral  subscript  on  a  residual  indicates  the 
vanishing  point  to  which  the  residual  corresponds,  e.g.,  5zi  is  the  residual  corre¬ 
sponding  to  the  measurement  of  Vi.  In  the  example  shown  in  the  figure,  spurious 
measurements  of  V2  frequently  occur  at  the  beginning  of  the  test  run  while  the  mock 
vehicle  is  resting  on  the  ground.  The  intersection  of  the  green  lines  in  Figure  14771  is  an 
example  of  one  such  measurement.  After  the  mock  vehicle  has  been  lifted  off  the  floor 
at  approximately  65  seconds  into  the  test  run,  the  residuals  for  V2  are  much  smaller. 

Another  observation  that  can  be  made  from  the  residuals  is  with  respect  to 
the  uncertainty  in  the  ^-components  of  the  vanishing  point  measurements.  When 
the  camera  is  looking  away  from  a  particular  vanishing  point,  there  is  much  greater 
uncertainty  in  the  ^-component  of  the  corresponding  measurement  than  when  the 
camera  is  pointed  toward  the  vanishing  point.  Looking  again  at  Figure  14.81  there  is 
much  greater  variability  in  the  ^-component  of  Sz2  before  the  mock  vehicle  turns  the 
corner  approximately  100  seconds  into  the  run  than  after.  After  the  corner,  the  camera 
is  peering  in  the  direction  of  V2,  and  the  fluctuations  in  the  ^-component  of  5z2  are  no 
longer  observed.  Now,  however,  the  z-component  of  bz\  varies  much  more  widely  than 
it  had  before.  The  z-component  of  5z%  exhibits  a  greater  degree  of  variability  than  the 
x  or  ^/-components  for  the  duration  of  the  test  run,  because  the  camera  is  never  looking 
in  the  direction  of  V3,  i.e. ,  straight  up  or  straight  down.  These  observations  provide 
evidence  that  when  the  camera  is  pointed  toward  a  particular  vanishing  direction, 
the  uncertainty  in  the  z-component  of  the  corresponding  vanishing  point  is  much 
smaller  than  the  z-components  of  the  other  two  vanishing  points.  The  3-cr  boundaries 
may  seem  large  somewhat  large,  particularly  over  portions  in  the  test  run  where  the 
residuals  are  consistently  small.  However,  it  will  be  shown  that  the  Liter’s  calculated 
uncertainties  closely  match  the  test  runs’  ensemble  variances. 

Lastly,  the  residuals  show  that  in  this  test  run  there  were  few  reliable  updates 
to  the  roll  estimate  while  the  mock  vehicle  was  resting  on  the  floor.  When  the  camera 
is  peering  in  the  direction  of  v1;  the  roll  axis  is  not  observable  using  measurements 
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Figure  4.8:  Measurement  residuals.  The  residuals  in  the  x ,  y  and  ^-components  of  all  three  vanishing  point  measurements 
for  a  single  test  run  are  shown  here  with  their  respective  3-cr  covariance  boundaries.  Any  measurements  outside  of  the  3-er 
threshold  are  rejected  and  are  not  incorporated  into  the  attitude  estimates. 


of  that  vanishing  point,  so  those  measurements  do  not  influence  the  roll  estimates. 
This  means  that  information  influencing  the  roll  axis  must  come  from  either  V2  or 
v3.  However,  many  of  the  measurements  of  v2  were  discarded  because  they  exhibited 
large  residuals,  and  few  measurements  of  v3  were  obtained.  Nonetheless,  even  under 
these  adverse  conditions,  the  roll  estimates  will  be  shown  to  remain  within  a  few 
degrees  of  the  tactical  inertial  roll  profile. 

4-3  Kalman  Filter  Attitude  Estimates 

Now  that  the  vanishing  point  detection  method  has  been  evaluated,  the  attitude 
profiles  obtained  by  coupling  the  MEMS  inertial  data  and  vanishing  point  measure¬ 
ments  will  be  presented  and  discussed.  Examples  of  an  attitude  profile  generated  by 
the  Kalman  filter  and  the  corresponding  attitude  profiles  calculated  from  the  tactical 
and  unaided  MEMS  inertial  measurements  are  shown  in  Figure  EPJ1  As  was  discussed 
in  Section  14.2.21  the  roll  estimate  is  degraded  during  the  initial  static  portion  of  the 
test  run  due  to  infrequent  and  inaccurate  measurements  of  vanishing  points  v2  and 
v3.  However,  in  the  long  term,  the  attitude  estimates  in  all  three  axes  are  significantly 
more  accurate  than  the  unaided  MEMS  inertial  estimates. 

Performing  the  error  calculation  procedure  discussed  in  Section  14.1.31  on  both 
the  unaided  MEMS  inertial  and  Kalman  filter  data  from  this  test  run  yields  the 
results  shown  in  Figure  14.101  Again,  the  Kalman  filter  provides  greater  accuracy 
and  stability  in  the  long  term.  However,  another  phenomenon  can  also  be  seen  in 
these  error  profiles.  There  is  an  overall  bias  in  each  of  the  Kalman  filter’s  attitude 
estimates  about  which  the  solutions  vary.  This  bias  will  be  discussed  in  greater  detail 
in  Section  14.3.21 

4-3.1  Kalman  Filter  Ensemble  Errors.  One  test  run  alone  cannot  charac¬ 
terize  the  nature  of  the  errors  in  the  Kalman  filter’s  attitude  solutions.  To  help  in 
understanding  the  filter’s  performance,  the  ensemble  of  errors  in  the  Kalman  filter’s 
attitude  estimates  for  all  fifteen  test  runs  and  the  corresponding  means  and  standard 
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Figure  4.9:  Kalman  filter  and  unaided  MEMS  inertial  attitude  profiles  from  1  run. 
The  attitude  profiles  calculated  from  the  Kalman  filter,  and  both  tactical  and  MEMS 
raw  inertial  data  are  shown  here.  Note  the  unbounded  long-term  drift  present  in  the 
attitude  profiles  from  the  unaided  MEMS  inertial  solution  has  been  constrained  in 
the  Kalman  filter  attitude  solution. 
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Figure  4.10:  Kalman  filter  and  unaided  MEMS  inertial  errors  from  1  run.  In  the 
long  term,  the  errors  in  the  Kalman  filter  solution  are  much  smaller  than  the  errors 
in  the  unaided  MEMS  inertial  solution. 
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deviations  are  shown  in  Figure  14.111  Comparing  the  ensemble  errors  in  the  Kalman 
filter’s  attitude  solution  shown  in  Figure  14.111  with  the  unaided  MEMS  inertial  en¬ 
semble  errors  shown  in  Figure  14.41  shows  that  the  long-term  improvement  exhibited 
by  the  single  test  case  discussed  earlier  is  present  in  all  of  the  test  runs.  The  unaided 
MEMS  solutions  have  a  standard  deviation  of  approximately  26.4  degrees  after  3.5 
minutes  which  will  only  continue  to  grow  over  time.  The  vision-aided  solutions,  on  the 
other  hand,  have  standard  deviations  of  approximately  1.5  degrees  in  the  pitch  and 
roll  axes,  and  approximately  0.9  degrees  in  the  yaw  axis  which  are  stable  long-term. 
Respectively,  these  represent  94%  and  96%  reductions  in  the  attitude  uncertainties. 
Furthermore,  the  long-term  stability  of  the  Kalman  filter’s  attitude  estimates  has 
entirely  eliminated  the  unaided  MEMS  inertial  solutions’  drift  rate  of  0.63  deg2/s. 

Another  interesting  comparison  to  make  is  between  the  uncertainties  calculated 
by  the  Kalman  filter  and  the  ensemble  standard  deviations.  Both  of  these  sequences 
are  shown  together  in  Figure  14.121  to  facilitate  this  comparison.  The  Kalman  filter 
uncertainty  profile  from  only  one  test  case  is  displayed,  for  clarity,  but  the  uncertain¬ 
ties  from  the  others  converge  to  nearly  the  same  steady  state  as  the  one  shown.  The 
standard  deviation  calculated  by  the  Kalman  filter  matches  the  ensemble  standard 
deviation  within  one  degree  for  most  of  the  time  segment  with  a  peak  difference  of 
about  2  degrees  observed  in  the  yaw  axis. 

While  the  level  of  accuracy  provided  by  the  Kalman  filter  demonstrates  a  vast 
improvement  over  the  unaided  inertial  solution,  it  would  likely  only  partially  meet  the 
requirements  for  indoor  flight.  An  inner  control  loop  for  a  rotary  vehicle  will  require 
accuracy  to  within  a  degree  or  two  in  the  roll  and  pitch  axes  to  control  lateral  motion. 
The  combined  effects  of  the  1.5  degree  standard  deviation  in  the  errors  and  overall  bias 
do  not  provide  for  a  solution  accurate  to  within  two  degrees  consistently.  However, 
there  would  be  a  larger  tolerance  for  errors  in  the  yaw  axis,  as  precise  heading  is  less 
important  than  relative  position  when  operating  indoors.  Also,  the  Kalman  filter’s 
yaw  solution  has  been  shown  to  be  more  accurate  than  the  pitch  and  roll  solutions. 
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Figure  4.11:  Kalman  filter  ensemble  errors.  The  ensemble  of  errors  in  the  atti¬ 

tudes  estimated  by  the  Kalman  filter  are  shown  here  with  their  means  and  standard 
deviations. 
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Figure  4.12:  Ensemble  and  Kalman  filer  standard  deviations.  The  steady-state 

standard  deviations  calculated  by  the  Kalman  filter  and  ensemble  standard  deviations 
are  close  to  one  another  over  the  entire  time  segment. 
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A  less  stringent  requirement  and  slightly  more  accurate  estimates  in  the  yaw  axis 
indicate  that  the  yaw  solution  may  be  sufficiently  accurate  for  such  an  application. 

4-3.2  Measurement  Bias.  As  mentioned  before,  there  is  a  bias  present  in  the 
Kalman  filter’s  attitude  estimates.  This  is  evidenced  by  the  non-zero  ensemble  means 
that  can  be  clearly  seen  in  Figure H. Ill  Each  of  the  two  sensors  used  to  provide  data 
to  the  Kalman  filter  is  a  possible  source  of  the  corruption  in  the  attitude  estimates. 
However,  the  MEMS  inertial  sensor’s  influence  can  be  removed  by  running  the  Kalman 
filter  using  the  tactical  inertial  measurements  without  changing  the  process  noise 
strength  matrix,  Q.  Essentially,  this  amounts  to  supplying  the  filter  with  as  close  to 
a  perfect  set  of  inertial  measurements  as  is  available,  while  still  allowing  the  vanishing 
point  measurements  to  influence  the  attitude  estimates  in  the  same  way  they  had 
with  the  MEMS  inertial  data. 

Figure  14.131  shows  the  ensemble  of  errors  that  are  obtained  when  the  tactical 
inertial  measurements  are  used  in  the  Kalman  filter.  The  solutions  obtained  when 
using  the  tactical  inertial  data  in  the  filter  exhibit  the  same  bias  that  is  evident  in  the 
filter’s  solutions  using  the  MEMS  inertial  measurements.  This  eliminates  the  MEMS 
inertial  sensor  as  the  source  of  the  bias,  leaving  only  the  optical  measurements  to 
blame. 

With  the  the  vanishing  point  measurements  pinpointed  as  the  source  of  the 
biases,  the  next  step  in  isolating  the  cause  is  to  try  to  fold  measurements  that  consis¬ 
tently  err  in  the  same  direction  and  with  the  same  magnitude.  This  was  accomplished 
by  visually  observing  the  vanishing  point  measurements  projected  onto  the  images  to 
determine  whether  erroneous  measurements  could  be  identihed.  The  vanishing  points 
affecting  the  roll  estimates  cannot  be  projected  onto  the  image  for  most  of  the  dura¬ 
tion  of  a  test  run  because  they  are  usually  oriented  nearly  parallel  to  the  image  plane, 
so  they  were  not  considered  under  this  review.  The  largest,  sustained  bias  observed 
in  either  the  pitch  or  yaw  axes  for  any  one  test  run  is  3  degrees  exhibited  downward 
direction  of  the  pitch  axis.  If  vanishing  point  identihed  by  the  detection  algorithm 
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Figure  4.13:  Kalman  filter  ensemble  errors  using  tactical  inertial  measurements. 

The  ensemble  statistics  exhibit  the  same  bias  that  is  present  in  the  solutions  obtained 
from  the  combined  MEMS  inertial  and  optical  measurements. 
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were  3  degrees  below  where  the  edges  of  the  hallways  appear  to  intersect,  it  would  be 
noticeable  to  a  human  observer  examining  the  image.  The  processed  images  with  the 
inlicr  lines  highlighted  as  in  Figures  [4.61  and  [4.71  from  the  test  run  with  the  3  degree 
pitch  bias  were  reviewed  and  no  such  measurements  were  observed. 

Time  limitations  and  resource  availability  did  not  facilitate  any  further  inves¬ 
tigation  into  the  source  of  the  measurement  biases.  However,  other  possible  causes 
may  include  non-orthogonality  of  the  walls,  ceilings  and  floors  and  straight  but  non¬ 
parallel  elements  in  the  environment.  Such  phenomena  violate  the  assumption  that 
the  environment  consists  primarily  of  mutually-orthogonal  planes  and  could  introduce 
error  into  the  vanishing  point  detection  and/or  attitude  estimation  process. 

4-4  Summary 

Shortcomings  notwithstanding,  the  coupling  of  vanishing  point  tracking  with 
inertial  measurements  has  been  shown  to  vastly  improve  vehicle  attitude  estimates. 
By  joining  two  inexpensive,  lightweight,  low-power  sensors,  a  drift-free  attitude  de¬ 
termination  method  that  is  accurate  to  within  a  few  degrees  is  attainable. 


V.  Conclusions  and  Future  Work 


This  chapter  summarizes  the  information  presented  earlier  in  this  thesis  and 
provides  a  few  suggestions  for  how  this  work  could  be  continued  and  improved. 

5.1  Conclusions 

Inertial  sensors  have  been  used  for  decades  to  provide  position  and  attitude 
information  to  various  users.  Unfortunately,  even  the  most  advanced  inertial  sensors 
are  subject  to  boundless  error  growth.  The  primary  objective  of  this  research  has 
been  to  demonstrate  a  method  for  constraining  the  errors  in  the  attitude  solution 
from  a  commercial  MEMS  inertial  sensor  through  the  use  of  computer  vision.  The 
two  sensors  are  used  in  harmony,  with  the  inertial  data  aiding  the  vision  process  and 
the  vision  data  aiding  the  inertial  process. 

The  combined  visual/inertial  attitude  estimation  method  was  developed  by  de¬ 
signing  an  extended  Kalman  filter  for  this  purpose.  The  filter’s  dynamics  model  was 
established  using  inertial  navigation  theory.  The  filter’s  measurement  model  was  es¬ 
tablished  beginning  with  the  Manhattan  world  assumption  from  [7],  A  Manhattan 
world  scene  contains  three  primary  groupings  of  parallel  lines  aligned  to  the  Man¬ 
hattan  grid.  The  projections  of  these  parallel  lines  onto  the  image  plane  intersect 
at  one  of  three  vanishing  points.  The  directions  of  the  vanishing  points  reveal  the 
camera’s  orientation  with  respect  to  the  scene,  and  are  used  to  update  the  Kalman 
filter’s  attitude  estimates. 

A  unique  method  for  detecting  vanishing  points  was  established  which  yields 
unit-length  Cartesian  3-vectors  indicating  each  vanishing  direction  expressed  in  the 
camera  reference  frame.  This  process  utilizes  the  RANSAC  concept  from  [9]  to  avoid 
having  to  find  every  possible  intersection  of  image  lines.  The  method  is  effective  at 
finding  vanishing  points  in  many  conditions  that  adhere  to  the  Manhattan  world  as¬ 
sumption,  but  some  line  geometries  and  measurement  noise  can  give  rise  to  erroneous 
measurements.  Monitoring  the  measurement  residuals  and  ignoring  gross  outliers 
prevents  such  spurious  measurements  from  influencing  the  attitude  estimates. 


An  experiment  was  developed  to  evaluate  the  extended  Kalman  filter’s  attitude 
estimates.  For  this  experiment,  an  indoor  motion  profile  mimicking  flight  was  se¬ 
lected  along  which  a  MEMS-grade  inertial  sensor,  tactical-grade  inertial  sensor,  and 
a  commercial  webcam  were  transported.  This  profile  guides  the  sensors  through  one 
hallway  towards  in  intersection,  clockwise  around  the  corner,  and  down  to  the  end 
of  a  second  hallway.  Data  collected  from  the  MEMS-grade  inertial  sensor  and  we¬ 
bcam  during  fifteen  test  runs  following  this  profile  were  post-processed  through  the 
extended  Kalman  filter,  and  the  filter’s  solutions  were  compared  with  the  solutions 
from  the  tactical  inertial  sensor  to  determine  their  accuracy. 

The  attitude  estimates  from  the  Kalman  filter  show  dramatic  improvement  over 
the  attitude  estimates  from  the  MEMS  inertial  sensor  alone.  After  only  3.5  minutes 
of  operation,  the  unaided  MEMS  inertial  estimates  have  a  standard  deviation  of  26.4 
degrees,  while  the  Kalman  filter’s  estimates  have  standard  deviations  of  1.5  degrees 
in  the  body  reference  frame’s  x  and  y- axes  and  only  0.9  degrees  in  the  z-axis.  Fur¬ 
thermore,  the  drift  rate  of  0.63  deg2/sec  in  the  unaided  MEMS  estimates  is  absent 
from  the  Kalman  filter’s  solutions. 

There  is  a  consistent  bias  in  the  Kalman  filter’s  attitude  estimates  from  each 
of  the  fifteen  test  runs.  The  MEMS  inertial  sensor  was  eliminated  as  a  possible 
cause,  leading  to  the  conclusion  that  the  bias  originates  from  the  vanishing  point 
measurements.  This  bias  has  the  greatest  magnitude  (~2.5  degrees)  in  the  vehicle 
body  reference  frame’s  y- axis. 

5.2  Future  Work 

There  are  various  ways  in  which  the  work  that  has  been  presented  could  be 
extended  or  improved.  Some  of  these  are  related  to  improving  or  modifying  the  van¬ 
ishing  point  detection  process,  and  others  involve  extending  the  attitude  estimation 
research  and  preparing  for  on-vehicle  implementation. 
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5.2.1  Vanishing  Point  Orthogonality.  The  attitude  estimation  technique 
described  in  this  thesis  relies  on  the  assumption  that  the  environment  in  which  a  host 
vehicle  is  operating  consists  primarily  of  many  planes  and  lines  oriented  in  one  of  three 
mutually  orthogonal  directions.  The  approach  to  vanishing  point  detection  presented 
herein  consists  of  looking  for  the  vanishing  point  in  each  of  the  three  principal  direc¬ 
tions  individually.  However,  it  may  prove  effective  to  search  for  the  complete  triad  of 
vanishing  points  all  at  once  instead  of  each  one  in  sequence.  In  J2S],  Rother  presents  a 
computationally  intensive  approach  to  finding  all  three  vanishing  directions  simulta¬ 
neously  in  which  every  possible  intersection  of  two  lines  from  the  image  is  examined. 
Combining  Rother’s  approach  with  the  RANSAC  method  presented  herein  may  prove 
effective  at  rapidly  obtaining  all  three  vanishing  directions  in  an  image. 

5.2.2  Vanishing  Point  Detection  Robustness.  During  one  of  the  test  runs 
that  were  performed,  a  pedestrian  entered  the  camera’s  field  of  view,  walking  from 
behind  the  camera  to  the  end  of  the  hall  and  turning  the  corner.  This  person’s  pres¬ 
ence  in  34  consecutive  images  appeared  to  have  no  impact  on  the  vanishing  point 
measurements  obtained  from  them,  even  though,  when  closest  to  the  camera,  he  or 
she  obstructed  approximately  10%  of  the  camera’s  view.  This  one  example  is  not 
sufficient  to  demonstrate  the  vanishing  point  detection  method’s  robustness  to  such 
disturbances,  nor  was  making  such  a  determination  an  objective  of  this  research.  How¬ 
ever,  an  investigation  into  the  impact  of  different  amounts  and  types  of  obstructions 
on  the  vanishing  point  detection  process  could  be  insightful. 

5.2.3  Motion  Profiles.  This  research  explored  only  a  single  motion  pro¬ 
file  through  a  pair  of  hallways  which  included  a  single,  90-degree  clockwise  change 
in  heading.  Additional  profiles  were  not  examined  due  to  time  and  resource  limi¬ 
tations.  However,  investigating  other  profiles  including  counter-clockwise  turns  and 
entering/exiting  rooms  adjacent  to  the  hallway  could  further  demonstrate  the  utility 
of  this  attitude  estimation  method.  Furthermore,  additional  profiles  may  provide  more 
insight  into  the  cause  and  nature  of  the  measurement  bias  discussed  in  Section  14.3.21 
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5.2.4  Real-time  Implementation.  The  attitude  estimates  generated  during 
this  research  effort  were  all  obtained  by  post-processing  the  optical  and  inertial  data, 
bnt  in  order  to  be  implemented  aboard  an  aerial  vehicle,  this  task  must  be  accom¬ 
plished  in  real  time.  The  microcomputer  onboard  the  intended  quadrotor  host  vehicle 
has  limited  computational  capability  and  runs  a  different  operating  system  than  the 
computer  used  to  process  the  data.  Real-time  implementation  of  the  attitude  estima¬ 
tion  method  presented  in  this  thesis  would  likely  require  that  the  Matlab  code  used 
to  implement  the  Kalman  filter  be  optimized  for  faster  processing  and  ported  to  the 
C  computing  language  for  on-vehicle  use.  Once  these  tasks  were  accomplished,  an 
investigation  into  the  latency  in  processing  images  in  real  time  could  help  determine 
the  feasibility  of  onboard  attitude  estimation  using  the  methods  presented  in  this 
thesis. 

5.2.5  Non-Manhattan  World.  The  attitude  estimation  method  presented  in 
this  thesis  has  been  founded  on  the  Manhattan  world  assumption.  While  some  man¬ 
made  environments  conform  to  this  model,  most  scenes  do  not  contain  many  parallel 
planar  surfaces.  While  images  of  these  non-Manhattan  scenes  will  not  contain  groups 
of  parallel  lines,  the  concept  of  a  vanishing  point  is  still  valid.  Identifying  the  vanishing 
directions  for  a  non-Manhattan  world  scene  is  a  much  more  difficult  task,  since  the 
intersections  of  straight  lines  cannot  be  used.  However,  if  the  vanishing  directions 
were  identified,  this  information  could  still  be  combined  with  inertial  measurements 
to  obtain  accurate  attitude  estimates. 

5.3  Closing 

This  research  has  presented  one  way  in  which  inertial  and  optical  sensors  can 
be  combined  to  provide  improved  navigation  information.  As  this  technology  contin¬ 
ues  to  be  developed,  an  equally-precise  alternative  to  satellite-based  navigation  may 
ultimately  be  achieved.  Only  time  will  tell  where  this  field  of  science  will  lead. 
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